Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
ethen8181
GitHub Repository: ethen8181/machine-learning
Path: blob/master/big_data/h2o/__pycache__/h2o_explainers.cpython-36.pyc
2587 views
3

��![��@s(ddljZddlmZGdd�d�ZdS)�N)�GridSpecc@sDeZdZdZdd�Zddd�Zddd	�Zd
d�Zdd
�Zdd�Z	dS)�H2OPartialDependenceExplainerai
    Partial Dependence explanation for binary classification H2O models.
    Works for both numerical and categorical (enum) features.

    Parameters
    ----------
    h2o_model : H2OEstimator
        H2O Model that was already fitted on the data.

    Attributes
    ----------
    feature_name_ : str
        The input feature_name to the .fit unmodified, will
        be used in subsequent method.

    is_cat_col_ : bool
        Whether the feature we're aiming to explain is a
        categorical feature or not.

    partial_dep_ : DataFrame
        A pandas dataframe that contains three columns, the
        feature's value and their corresponding mean prediction
        and standard deviation of the prediction. e.g.

        feature_name    mean_response stddev_response
        3000.000000     0.284140      0.120659
        318631.578947   0.134414      0.076054
        634263.157895   0.142961      0.083630

        The feature_name column will be the actual feature_name that
        we pass to the .fit method whereas the mean_response and
        stddev_response column will be fixed columns generated.
    cCs
||_dS)N)�	h2o_model)�selfr�r�@/Users/mingyuliu/machine-learning/big_data/h2o/h2o_explainers.py�__init__(sz&H2OPartialDependenceExplainer.__init__�cCsd||j�d|_|jr6t||j�d�}t||�}|jj||g|dd�}||_|dj�|_	|S)a�
        Obtain the partial dependence result.

        Parameters
        ----------
        data : H2OFrame, shape [n_samples, n_features]
            Input data to the H2O estimator/model.

        feature_name : str
            Feature name in the data what we wish to explain.

        n_bins : int, default 20
            Number of bins used. For categorical columns, we will make sure the number
            of bins exceed the distinct level count.

        Returns
        -------
        self
        rF)�cols�nbins�plot)
�isfactorZis_cat_col_�len�levels�maxr�partial_plot�
feature_name_�
as_data_frame�partial_dep_)r�dataZfeature_nameZn_bins�n_levels�partial_deprrr�fit+s

z!H2OPartialDependenceExplainer.fitTcCsVtdd�}tj|ddd�f�}|j|�tj|dd�dd�f�}|j|||�|S)a�
        Use the partial dependence result to generate
        a partial dependence plot (using matplotlib).

        Parameters
        ----------
        centered : bool, default True
            Center the partial dependence plot by subtacting every partial
            dependence result table's column value with the value of the first
            column, i.e. first column's value will serve as the baseline
            (centered at 0) for all other values.

        plot_stddev : bool, default True
            Apart from plotting the mean partial dependence, also show the
            standard deviation as a fill between.

        Returns
        -------
        matplotlib figure
        ��rN)r�plt�subplot�_plot_title�
_plot_content)r�centered�plot_stddev�figure�ax1�ax2rrrrJs

z"H2OPartialDependenceExplainer.plotcCsld}dj|j�}dj|jjd�}d}d}|jd�|jdd|||d	�|jdd
|d||d�|jd
�dS)N�Arialz(Partial Dependence Plot for '{}' featurez Number of unique grid points: {}r���whitegffffff�?)�fontsize�fontnameg�������?�grey)�colorr(r)�off)�formatrr�shape�
set_facecolor�text�axis)r�ax�font_family�titleZsubtitle�title_fontsizeZsubtitle_fontsizerrrrfs


z)H2OPartialDependenceExplainer._plot_titlecCs�d}d}d}d}d}d}	d}
d}|jd	}|r:||d
8}|jd}
||
}||
}|j|j}|j||||d|d
�|j|d
g|j|
d|	d�|r�|j|||||d�|j|j|d�|j|�dS)N�rz#1A4E5Dg�������?z#66C2D7g�?z#E75438�
Z
mean_responserZstddev_response�o)r+�	linewidth�marker�
markersizez--)r+�	linestyler9)�alphar+)r()rrr�size�fill_between�
set_xlabel�_modify_axis)rr2rr Zpd_linewidthZ
pd_markersizeZpd_colorZ
fill_alphaZ
fill_colorZzero_linewidthZ
zero_colorZxlabel_fontsizeZpd_mean�std�upper�lower�xrrrrvs2



z+H2OPartialDependenceExplainer._plot_contentc
Cs�d}d}d}|jdd|||d�|jd�|j�j�|j�j�xdD]}|j|jd�qHWx$dD]}|jdd|ddddd�qfWdS)N�z#9E9E9Ez#424242�both�major)r1�which�colors�	labelsize�
labelcolorr'�top�left�right�bottomFrE�yTz--g�?�kg333333�?)�ls�lw�cr=)rMrNrOrP)rErQ)	�tick_paramsr/�	get_yaxis�	tick_left�	get_xaxis�tick_bottom�spines�set_visible�grid)rr2Ztick_labelsizeZtick_colorsZtick_labelcolor�	directionr1rrrrA�s



z*H2OPartialDependenceExplainer._modify_axisN)r	)TT)
�__name__�
__module__�__qualname__�__doc__rrrrrrArrrrrs!

$r)�matplotlib.pyplot�pyplotr�matplotlib.gridspecrrrrrr�<module>s