Path: blob/master/model_selection/partial_dependence/__pycache__/partial_dependence.cpython-35.pyc
2586 views
�@hZ/ � @ s� d d l Z d d l Z d d l j Z d d l m Z d d l m
Z
m Z d d l m
Z
d g Z Gd d � d � Z d d � Z d d
� Z d S)� N)�ceil)�Parallel�delayed)�GridSpec�PartialDependenceExplainerc @ ss e Z d Z d Z d d d d d d d � Z d d � Z d
d d d
� Z d d � Z d d � Z d d � Z d S)r a�
Partial Dependence explanation [1]_.
- Supports scikit-learn like classification and regression classifiers.
- Works for both numerical and categorical columns.
Parameters
----------
estimator : sklearn-like classifier
Model that was fitted on the data.
n_grid_points : int, default 50
Number of grid points used in replacement
for the original numeric data. Only used
if the targeted column is numeric. For categorical
column, the number of grid points will always be
the distinct number of categories in that column.
Smaller number of grid points serves as an
approximation for the total number of unique
points and will result in faster computation
batch_size : int, default = 'auto'
Compute partial depedence prediction batch by batch to save
memory usage, the default batch size will be
ceil(number of rows in the data / the number of grid points used)
n_jobs : int, default 1
Number of jobs to run in parallel, if the model already fits
extremely fast on the data, then specify 1 so that there's no
overhead of spawning different processes to do the computation
verbose : int, default 1
The verbosity level: if non zero, progress messages are printed.
Above 50, the output is sent to stdout. The frequency of the messages increases
with the verbosity level. If it more than 10, all iterations are reported.
pre_dispatch : int or str, default '2*n_jobs'
Controls the number of jobs that get dispatched during parallel
execution. Reducing this number can be useful to avoid an
explosion of memory consumption when more jobs get dispatched
than CPUs can process. Possible inputs:
- None, in which case all the jobs are immediately
created and spawned. Use this for lightweight and
fast-running jobs, to avoid delays due to on-demand
spawning of the jobs
- An int, giving the exact number of total jobs that are
spawned
- A string, giving an expression as a function of n_jobs,
as in '2*n_jobs'
Attributes
----------
feature_name_ : str
The input feature_name to the .fit unmodified, will
be used in subsequent method.
feature_type_ : str
The input feature_type to the .fit unmodified, will
be used in subsequent method.
feature_grid_ : 1d ndarray
Unique grid points that were used to generate the
partial dependence result.
results : list of DataFrame
Partial dependence result. If it's a classification
estimator then each index of the list is the result
for each class. On the other hand, if it's a regression
estimator, it will be a list with 1 element.
References
----------
.. [1] `Python partial dependence plot toolbox
<https://github.com/SauceCat/PDPbox>`_
�2 �auto� z2*n_jobsc C s1 | | _ | | _ | | _ | | _ | | _ d S)N)�n_jobs�verbose� estimator�pre_dispatch�
n_grid_points)�selfr r �
batch_sizer
r r
� r �V/Users/ethen/machine-learning/model_selection/partial_dependence/partial_dependence.py�__init__Y s
z#PartialDependenceExplainer.__init__c s� | j } y | j j � d � | j � Wn'