Path: blob/master/big_data/h2o/__pycache__/h2o_explainers.cpython-36.pyc
2587 views
3
��![� � @ s( d dl jZd dlmZ G dd� d�ZdS )� N)�GridSpecc @ sD e Zd ZdZdd� Zddd�Zddd �Zd
d� Zdd
� Zdd� Z dS )�H2OPartialDependenceExplainerai
Partial Dependence explanation for binary classification H2O models.
Works for both numerical and categorical (enum) features.
Parameters
----------
h2o_model : H2OEstimator
H2O Model that was already fitted on the data.
Attributes
----------
feature_name_ : str
The input feature_name to the .fit unmodified, will
be used in subsequent method.
is_cat_col_ : bool
Whether the feature we're aiming to explain is a
categorical feature or not.
partial_dep_ : DataFrame
A pandas dataframe that contains three columns, the
feature's value and their corresponding mean prediction
and standard deviation of the prediction. e.g.
feature_name mean_response stddev_response
3000.000000 0.284140 0.120659
318631.578947 0.134414 0.076054
634263.157895 0.142961 0.083630
The feature_name column will be the actual feature_name that
we pass to the .fit method whereas the mean_response and
stddev_response column will be fixed columns generated.
c C s
|| _ d S )N)� h2o_model)�selfr � r �@/Users/mingyuliu/machine-learning/big_data/h2o/h2o_explainers.py�__init__( s z&H2OPartialDependenceExplainer.__init__� c C sd || j � d | _| jr6t|| j� d �}t||�}| jj||g|dd�}|| _|d j� | _ | S )a�
Obtain the partial dependence result.
Parameters
----------
data : H2OFrame, shape [n_samples, n_features]
Input data to the H2O estimator/model.
feature_name : str
Feature name in the data what we wish to explain.
n_bins : int, default 20
Number of bins used. For categorical columns, we will make sure the number
of bins exceed the distinct level count.
Returns
-------
self
r F)�cols�nbins�plot)
�isfactorZis_cat_col_�len�levels�maxr �partial_plot�
feature_name_�
as_data_frame�partial_dep_)r �dataZfeature_nameZn_bins�n_levels�partial_depr r r �fit+ s
z!H2OPartialDependenceExplainer.fitTc C sV t dd�}tj|ddd�f �}| j|� tj|dd�dd�f �}| j|||� |S )a�
Use the partial dependence result to generate
a partial dependence plot (using matplotlib).
Parameters
----------
centered : bool, default True
Center the partial dependence plot by subtacting every partial
dependence result table's column value with the value of the first
column, i.e. first column's value will serve as the baseline
(centered at 0) for all other values.
plot_stddev : bool, default True
Apart from plotting the mean partial dependence, also show the
standard deviation as a fill between.
Returns
-------
matplotlib figure
� � r N)r �plt�subplot�_plot_title�
_plot_content)r �centered�plot_stddev�figure�ax1�ax2r r r r J s
z"H2OPartialDependenceExplainer.plotc C sl d}dj | j�}dj | jjd �}d}d}|jd� |jdd|||d � |jdd
|d||d� |jd
� d S )N�Arialz(Partial Dependence Plot for '{}' featurez Number of unique grid points: {}r � � �whitegffffff�?)�fontsize�fontnameg�������?�grey)�colorr( r) �off)�formatr r �shape�
set_facecolor�text�axis)r �ax�font_family�titleZsubtitle�title_fontsizeZsubtitle_fontsizer r r r f s
z)H2OPartialDependenceExplainer._plot_titlec C s� d}d}d}d}d}d} d}
d}| j d }|r:||d
8 }| j d }
||
}||
}| j | j }|j||||d|d
� |j|d
g|j |
d| d� |r�|j|||||d� |j| j|d� | j|� d S )N� r z#1A4E5Dg�������?z#66C2D7g �?z#E75438�
Z
mean_responser Zstddev_response�o)r+ � linewidth�marker�
markersizez--)r+ � linestyler9 )�alphar+ )r( )r r r �size�fill_between�
set_xlabel�_modify_axis)r r2 r r Zpd_linewidthZ
pd_markersizeZpd_colorZ
fill_alphaZ
fill_colorZzero_linewidthZ
zero_colorZxlabel_fontsizeZpd_mean�std�upper�lower�xr r r r v s2
z+H2OPartialDependenceExplainer._plot_contentc
C s� d}d}d}|j dd|||d� |jd� |j� j� |j� j� xdD ]}|j| jd� qHW x$dD ]}|jdd|ddddd� qfW d S )N� z#9E9E9Ez#424242�both�major)r1 �which�colors� labelsize�
labelcolorr'