GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ko/probability/examples/HLM_TFP_R_Stan.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2018 The TensorFlow Probability Authors.

Licensed under the Apache License, Version 2.0 (the "License");

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License"); { display-mode: "form" }
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

{TF Probability, R, Stan}의 선형 혼합 효과 회귀

1. 소개

이 colab에서는 선형 혼합 효과 회귀 모델을 인기 있는 장난감 데이터세트에 피팅합니다. R의 lme4, Stan의 혼합 효과 패키지, 및 TensorFlow Probability(TFP) 프리미티브를 사용하여 이 피팅을 세 번 적용합니다. 세 가지 모두가 대략적으로 동일한 피팅 매개변수와 사후 분포를 제공함을 보여줌으로써 결론을 내립니다.

가장 중요한 결론은 TFP가 HLM과 같은 모델을 피팅하는 데 필요한 일반적인 부분을 가지고 있으며, 다른 소프트웨어 패키지(예: lme4, rstanarm)와 일관된 결과를 생성한다는 것입니다. 이 colab은 비교된 패키지의 계산 효율성을 정확하게 반영하지 않습니다.

In [ ]:

%matplotlib inline

import os
from six.moves import urllib
import numpy as np
import pandas as pd
import warnings

from matplotlib import pyplot as plt
import seaborn as sns

from IPython.core.pylabtools import figsize
figsize(11, 9)

import tensorflow.compat.v1 as tf
import tensorflow_datasets as tfds
import tensorflow_probability as tfp

2 계층적 선형 모델

R, Stan, TFP를 비교하기 위해 계층적 선형 모델(HLM)을 Gelman 등의 베이지안 데이터 분석(559페이지, 2차 개정판; 250페이지, 3차 개정판)에서 잘 알려지게 된 Radon 데이터세트에 피팅할 것입니다.

다음 생성 모델을 가정합니다.

\begin{align*} \text{for } & c=1\ldots \text{NumCounties}:\\ & \beta_c \sim \text{Normal}\left(\text{loc}=0, \text{scale}=\sigma_C \right) \\ \text{for } & i=1\ldots \text{NumSamples}:\\ &\eta_i = \underbrace{\omega_0 + \omega_1 \text{Floor}_i}_\text{fixed effects} + \underbrace{\beta_{ \text{County}_i} \log( \text{UraniumPPM}_{\text{County}_i}))}_\text{random effects} \\ &\log(\text{Radon}_i) \sim \text{Normal}(\text{loc}=\eta_i , \text{scale}=\sigma_N) \end{align*}

R의 lme4 "틸데 표기법"에서 이 모델은 다음과 같습니다.

log_radon ~ 1 + floor + (0 + log_uranium_ppm | county)

$\{\beta_c\}_{c=1}^\text{NumCounties}$ 의 사후 분포(증거에 따라 조건부)를 사용하여 $\omega, \sigma_C, \sigma_N$ 에 대한 MLE를 찾습니다.

본질적으로 동일하지만 임의의 절편을 가진 모델에 대해서는 *부록 A*를 참조하세요.

HLM에 대한 보다 일반적인 규정 사항은 *부록 B*를 참조하세요.

3 데이터 먼징(Data Munging)

이 섹션에서는 radon 데이터세트를 얻고 가정된 모델을 준수하도록 몇 가지 최소한의 전처리를 수행합니다.

In [ ]:

def load_and_preprocess_radon_dataset(state='MN'):
  """Preprocess Radon dataset as done in "Bayesian Data Analysis" book.
  
  We filter to Minnesota data (919 examples) and preprocess to obtain the
  following features:
  - `log_uranium_ppm`: Log of soil uranium measurements.
  - `county`: Name of county in which the measurement was taken.
  - `floor`: Floor of house (0 for basement, 1 for first floor) on which the
    measurement was taken.

  The target variable is `log_radon`, the log of the Radon measurement in the
  house.
  """
  ds = tfds.load('radon', split='train')
  radon_data = tfds.as_dataframe(ds)
  radon_data.rename(lambda s: s[9:] if s.startswith('feat') else s, axis=1, inplace=True)
  df = radon_data[radon_data.state==state.encode()].copy()

  # For any missing or invalid activity readings, we'll use a value of `0.1`.
  df['radon'] = df.activity.apply(lambda x: x if x > 0. else 0.1)
  # Make county names look nice. 
  df['county'] = df.county.apply(lambda s: s.decode()).str.strip().str.title()
  # Remap categories to start from 0 and end at max(category).
  county_name = sorted(df.county.unique())
  df['county'] = df.county.astype(
      pd.api.types.CategoricalDtype(categories=county_name)).cat.codes
  county_name = list(map(str.strip, county_name))

  df['log_radon'] = df['radon'].apply(np.log)
  df['log_uranium_ppm'] = df['Uppm'].apply(np.log)
  df = df[['idnum', 'log_radon', 'floor', 'county', 'log_uranium_ppm']]

  return df, county_name

In [ ]:

radon, county_name = load_and_preprocess_radon_dataset()

In [ ]:

# We'll use the following directory to store our preprocessed dataset.
CACHE_DIR = os.path.join(os.sep, 'tmp', 'radon')

# Save processed data. (So we can later read it in R.)
if not tf.gfile.Exists(CACHE_DIR):
  tf.gfile.MakeDirs(CACHE_DIR)
with tf.gfile.Open(os.path.join(CACHE_DIR, 'radon.csv'), 'w') as f:
  radon.to_csv(f, index=False)

3.1 데이터 알기

이 섹션에서는 제안된 모델이 합리적일 수 있는 이유를 더 잘 이해하기 위해 radon 데이터세트를 살펴봅니다.

In [ ]:

radon.head()

In [ ]:

fig, ax = plt.subplots(figsize=(22, 5));
county_freq = radon['county'].value_counts()
county_freq.plot(kind='bar', color='#436bad');
plt.xlabel('County index')
plt.ylabel('Number of radon readings')
plt.title('Number of radon readings per county', fontsize=16)
county_freq = np.array(zip(county_freq.index, county_freq.values))  # We'll use this later.

In [ ]:

fig, ax = plt.subplots(ncols=2, figsize=[10, 4]);

radon['log_radon'].plot(kind='density', ax=ax[0]);
ax[0].set_xlabel('log(radon)')

radon['floor'].value_counts().plot(kind='bar', ax=ax[1]);
ax[1].set_xlabel('Floor');
ax[1].set_ylabel('Count');

fig.subplots_adjust(wspace=0.25)

결론:

85개 카운티의 롱테일이 있습니다(GLMM에서 흔히 발생).
실제로 $\log(\text{Radon})$ 는 제약이 없습니다. 따라서 선형 회귀가 의미가 있을 수 있습니다.
판독값은 $0$ 번째 층에서 가장 많이 이루어집니다. $1$ 층 이상에서는 판독이 이루어지지 않았습니다. 따라서 고정 효과에는 두 개의 가중치만 있습니다.

4 R의 HLM

이 섹션에서는 위에서 설명한 확률 모델에 맞추기 위해 R의 lme4 패키지를 사용합니다.

참고: 이 섹션을 실행하려면 R colab 런타임으로 전환해야 합니다.

In [ ]:

suppressMessages({
  library('bayesplot')
  library('data.table')
  library('dplyr')
  library('gfile')
  library('ggplot2')
  library('lattice')
  library('lme4')
  library('plyr')
  library('rstanarm')
  library('tidyverse')
  RequireInitGoogle()
})

In [ ]:

data = read_csv(gfile::GFile('/tmp/radon/radon.csv'))

Parsed with column specification:
cols(
  log_radon = col_double(),
  floor = col_integer(),
  county = col_integer(),
  log_uranium_ppm = col_double()
)

In [ ]:

head(data)

# A tibble: 6 x 4
  log_radon floor county log_uranium_ppm
      <dbl> <int>  <int>           <dbl>
1     0.788     1      0          -0.689
2     0.788     0      0          -0.689
3     1.06      0      0          -0.689
4     0         0      0          -0.689
5     1.13      0      1          -0.847
6     0.916     0      1          -0.847

In [ ]:

# https://github.com/stan-dev/example-models/wiki/ARM-Models-Sorted-by-Chapter
radon.model <- lmer(log_radon ~ 1 + floor  + (0 + log_uranium_ppm | county), data = data)

In [ ]:

summary(radon.model)

Linear mixed model fit by REML ['lmerMod']
Formula: log_radon ~ 1 + floor + (0 + log_uranium_ppm | county)
   Data: data

REML criterion at convergence: 2166.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.5202 -0.6064  0.0107  0.6334  3.4111 

Random effects:
 Groups   Name            Variance Std.Dev.
 county   log_uranium_ppm 0.7545   0.8686  
 Residual                 0.5776   0.7600  
Number of obs: 919, groups:  county, 85

Fixed effects:
            Estimate Std. Error t value
(Intercept)  1.47585    0.03899   37.85
floor       -0.67974    0.06963   -9.76

Correlation of Fixed Effects:
      (Intr)
floor -0.330

In [ ]:

qqmath(ranef(radon.model, condVar=TRUE))

$county

In [ ]:

write.csv(as.data.frame(ranef(radon.model, condVar = TRUE)), '/tmp/radon/lme4_fit.csv')

5 Stan의 HLM

이 섹션에서는 위 lme4 모델과 동일한 공식/구문을 사용하여 Stan 모델을 피팅하기 위해 rstanarm을 사용합니다.

아래의 lme4 및 TF 모델과 달리 rstanarm은 완전한 베이지안 모델입니다. 즉, 분포에서 매개변수 자체를 가져온 정규 분포에서 모든 매개변수를 가져온 것으로 가정합니다.

참고: 이 섹션을 실행하려면 R colab 런타임으로 전환해야 합니다.

In [ ]:

fit <- stan_lmer(log_radon ~ 1 + floor  + (0 + log_uranium_ppm | county), data = data)

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).

Chain 1, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1, Iteration: 2000 / 2000 [100%]  (Sampling)
 Elapsed Time: 7.73495 seconds (Warm-up)
               2.98852 seconds (Sampling)
               10.7235 seconds (Total)


SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).

Chain 2, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2, Iteration: 2000 / 2000 [100%]  (Sampling)
 Elapsed Time: 7.51252 seconds (Warm-up)
               3.08653 seconds (Sampling)
               10.5991 seconds (Total)


SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).

Chain 3, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3, Iteration: 2000 / 2000 [100%]  (Sampling)
 Elapsed Time: 8.14628 seconds (Warm-up)
               3.01001 seconds (Sampling)
               11.1563 seconds (Total)


SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).

Chain 4, Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4, Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4, Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4, Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4, Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4, Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4, Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4, Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4, Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4, Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4, Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4, Iteration: 2000 / 2000 [100%]  (Sampling)
 Elapsed Time: 7.6801 seconds (Warm-up)
               3.23663 seconds (Sampling)
               10.9167 seconds (Total)

참고: 런타임은 단일 CPU 코어에서 제공됩니다. (이 colab은 Stan 또는 TFP 런타임을 충실하게 나타내기 위한 것이 아닙니다.)

In [ ]:

fit

stan_lmer(formula = log_radon ~ 1 + floor + (0 + log_uranium_ppm | 
    county), data = data)

Estimates:
            Median MAD_SD
(Intercept)  1.5    0.0  
floor       -0.7    0.1  
sigma        0.8    0.0  

Error terms:
 Groups   Name            Std.Dev.
 county   log_uranium_ppm 0.87    
 Residual                 0.76    
Num. levels: county 85 

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 1.2    0.0   

Observations: 919  Number of unconstrained parameters: 90 

In [ ]:

color_scheme_set("red")
ppc_dens_overlay(y = fit$y,
                 yrep = posterior_predict(fit, draws = 50))

In [ ]:

color_scheme_set("brightblue")
ppc_intervals(
  y = data$log_radon,
  yrep = posterior_predict(fit),
  x = data$county,
  prob = 0.8
) +
  labs(
    x = "County",
    y = "log radon",
    title = "80% posterior predictive intervals \nvs observed log radon",
    subtitle = "by county"
  ) +
  panel_bg(fill = "gray95", color = NA) +
  grid_lines(color = "white")

In [ ]:

# Write the posterior samples (4000 for each variable) to a CSV.
write.csv(tidy(as.matrix(fit)), "/tmp/radon/stan_fit.csv")

참고: Python TF 커널 런타임으로 다시 전환하세요.

In [ ]:

with tf.gfile.Open('/tmp/radon/lme4_fit.csv', 'r') as f:
  lme4_fit = pd.read_csv(f, index_col=0)

In [ ]:

lme4_fit.head()

나중에 시각화하기 위해 lme4에서 그룹 무작위 효과에 대한 점 추정치와 조건부 표준 편차를 가져옵니다.

In [ ]:

posterior_random_weights_lme4 = np.array(lme4_fit.condval, dtype=np.float32)
lme4_prior_scale = np.array(lme4_fit.condsd, dtype=np.float32)
print(posterior_random_weights_lme4.shape, lme4_prior_scale.shape)

(85,) (85,)

lme4 추정 평균과 표준 편차를 사용하여 카운티 가중치에 대한 샘플을 끌어옵니다.

In [ ]:

with tf.Session() as sess:
  lme4_dist = tfp.distributions.Independent(
      tfp.distributions.Normal(
          loc=posterior_random_weights_lme4,
          scale=lme4_prior_scale),
      reinterpreted_batch_ndims=1)
  posterior_random_weights_lme4_final_ = sess.run(lme4_dist.sample(4000))

In [ ]:

posterior_random_weights_lme4_final_.shape

(4000, 85)

또한 Stan 피팅으로부터 카운티 가중치의 사후 샘플을 가져옵니다.

In [ ]:

with tf.gfile.Open('/tmp/radon/stan_fit.csv', 'r') as f:
  samples = pd.read_csv(f, index_col=0)

In [ ]:

samples.head()

In [ ]:

posterior_random_weights_cols = [
    col for col in samples.columns if 'b.log_uranium_ppm.county' in col
]
posterior_random_weights_final_stan = samples[
    posterior_random_weights_cols].values
print(posterior_random_weights_final_stan.shape)

(4000, 85)

이 Stan 예제는 확률 모델을 직접 지정하여 TFP에 더 가까운 스타일로 LMER을 구현하는 방법을 보여줍니다.

6 TF 확률의 HLM

이 섹션에서는 계층적 선형 모델을 지정하고 미지의 매개변수를 피팅하기 위해 낮은 수준의 TensorFlow Probability 프리미티브(Distributions)를 사용합니다.

In [ ]:

# Handy snippet to reset the global graph and global session.
with warnings.catch_warnings():
  warnings.simplefilter('ignore')
  tf.reset_default_graph()
  try:
    sess.close()
  except:
    pass
  sess = tf.InteractiveSession()

6.1 모델 지정

이 섹션에서는 TFP 프리미티브를 사용하여 Radon 선형 혼합 효과 모델을 지정합니다. 이를 위해 두 개의 TFP 분포를 생성하는 두 개의 함수를 지정합니다.

make_weights_prior: 임의 가중치(선형 예측기를 계산하기 위해 $\log(\text{UraniumPPM}_{c_i})$ 를 곱함)에 대한 다변량 Normal 사전값.
make_log_radon_likelihood: 관찰된 각 $\log(\text{Radon}_i)$ 종속 변수에 대한 Normal 분포의 배치

이러한 각 분포의 매개변수를 피팅하려고 하므로 TF 변수(즉, tf.get_variable)를 사용해야 합니다. 그러나 제약 없는 최적화를 사용해야 하기 때문에 실제 값을 제약하여 표준 편차를 나타내는 양수와 같이 필요한 의미를 얻을 수 있는 방법을 찾아야 합니다.

In [ ]:

inv_scale_transform = lambda y: np.log(y)  # Not using TF here.
fwd_scale_transform = tf.exp

다음 함수는 사전값 $p(\beta|\sigma_C)$ 를 구성합니다. 여기서 $\beta$ 는 무작위 효과 가중치를 나타내고 $\sigma_C$ 는 표준 편차를 나타냅니다.

tf.make_template을 사용하여 이 함수에 대한 첫 번째 호출이 사용되는 TF 변수를 인스턴스화하고 이후의 모든 호출이 변수의 현재 값을 재사용하도록 합니다.

In [ ]:

def _make_weights_prior(num_counties, dtype):
  """Returns a `len(log_uranium_ppm)` batch of univariate Normal."""
  raw_prior_scale = tf.get_variable(
      name='raw_prior_scale',
      initializer=np.array(inv_scale_transform(1.), dtype=dtype))
  return tfp.distributions.Independent(
      tfp.distributions.Normal(
          loc=tf.zeros(num_counties, dtype=dtype),
          scale=fwd_scale_transform(raw_prior_scale)),
      reinterpreted_batch_ndims=1)


make_weights_prior = tf.make_template(
    name_='make_weights_prior', func_=_make_weights_prior)

다음 함수는 가능도 $p(y|x,\omega,\beta,\sigma_N)$ 를 구성합니다. 여기서 $y,x$ 는 응답 및 증거를 나타내고 $\omega,\beta$ 는 고정 및 무작위 효과 가중치를 나타내며, $\sigma_N$ 표준 편차를 나타냅니다.

여기서도 tf.make_template을 사용하여 호출 간에 TF 변수가 재사용되도록 합니다.

In [ ]:

def _make_log_radon_likelihood(random_effect_weights, floor, county,
                               log_county_uranium_ppm, init_log_radon_stddev):
  raw_likelihood_scale = tf.get_variable(
      name='raw_likelihood_scale',
      initializer=np.array(
          inv_scale_transform(init_log_radon_stddev), dtype=dtype))
  fixed_effect_weights = tf.get_variable(
      name='fixed_effect_weights', initializer=np.array([0., 1.], dtype=dtype))
  fixed_effects = fixed_effect_weights[0] + fixed_effect_weights[1] * floor
  random_effects = tf.gather(
      random_effect_weights * log_county_uranium_ppm,
      indices=tf.to_int32(county),
      axis=-1)
  linear_predictor = fixed_effects + random_effects
  return tfp.distributions.Normal(
      loc=linear_predictor, scale=fwd_scale_transform(raw_likelihood_scale))


make_log_radon_likelihood = tf.make_template(
    name_='make_log_radon_likelihood', func_=_make_log_radon_likelihood)

마지막으로, 사전값 및 가능도 생성기를 사용하여 결합 로그-밀도를 구성합니다.

In [ ]:

def joint_log_prob(random_effect_weights, log_radon, floor, county,
                   log_county_uranium_ppm, dtype):
  num_counties = len(log_county_uranium_ppm)
  rv_weights = make_weights_prior(num_counties, dtype)
  rv_radon = make_log_radon_likelihood(
      random_effect_weights,
      floor,
      county,
      log_county_uranium_ppm,
      init_log_radon_stddev=radon.log_radon.values.std())
  return (rv_weights.log_prob(random_effect_weights)
          + tf.reduce_sum(rv_radon.log_prob(log_radon), axis=-1))

6.2 훈련(기대 극대화의 확률적 근사)

선형 혼합 효과 회귀 모델에 피팅하기 위해 기대값 최대화 알고리즘(SAEM)의 확률적 근사 버전을 사용합니다. 기본적인 개념은 예상되는 결합 로그-밀도를 근사하기 위해 사후 샘플을 사용하는 것입니다(E-단계). 그런 다음 이 계산을 최대화하는 매개변수를 찾습니다(M-단계). 좀 더 구체적으로, 고정 소수점 반복은 다음과 같이 주어집니다.

\begin{align*} \text{E}[ \log p(x, Z | \theta) | \theta_0] &\approx \frac{1}{M} \sum_{m=1}^M \log p(x, z_m | \theta), \quad Z_m\sim p(Z | x, \theta_0) && \text{E-step}\\ &=: Q_M(\theta, \theta_0) \\ \theta_0 &= \theta_0 - \eta \left.\nabla_\theta Q_M(\theta, \theta_0)\right|_{\theta=\theta_0} && \text{M-step} \end{align*}

여기서 $x$ 는 증거를 나타내고, $Z$ 는 주변화해야 하는 일부 잠재 변수를 나타내며, $\theta,\theta_0$ 은 가능한 매개변수화를 나타냅니다.

더 자세한 설명은 Bernard Delyon, Marc Lavielle, Eric, Moulines(Ann. Statist., 1999)의 EM 알고리즘의 확률적 근사 버전의 수렴을 참조하세요.

E-단계를 계산하려면 사후에서 샘플링해야 합니다. 여기서 사후는 샘플링하기 쉽지 않기 때문에 HMC(Hamiltonian Monte Carlo)를 사용합니다. HMC는 새로운 샘플을 제안하기 위해 정규화되지 않은 사후 로그-밀도의 그래디언트(매개변수가 아닌 wrt 상태)를 사용하는 Monte Carlo Markov Chain 절차입니다.

정규화되지 않은 사후 로그-밀도를 지정하는 것은 간단합니다. 단순히 우리가 조건화하려는 모든 항목에 "고정된" 결합 로그-밀도일 뿐입니다.

In [ ]:

# Specify unnormalized posterior.

dtype = np.float32

log_county_uranium_ppm = radon[
    ['county', 'log_uranium_ppm']].drop_duplicates().values[:, 1]
log_county_uranium_ppm = log_county_uranium_ppm.astype(dtype)

def unnormalized_posterior_log_prob(random_effect_weights):
  return joint_log_prob(
      random_effect_weights=random_effect_weights,
      log_radon=dtype(radon.log_radon.values),
      floor=dtype(radon.floor.values),
      county=np.int32(radon.county.values),
      log_county_uranium_ppm=log_county_uranium_ppm,
      dtype=dtype)

이제 HMC 전환 커널을 생성하여 E-단계 설정을 완료합니다.

참고 사항:

state_stop_gradient=True를 사용하여 M-단계가 MCMC에서 끌어당겨져 역전파되는 것을 방지합니다. (기억하십시오. E-단계는 이전에 가장 잘 알려진 estimator에서 의도적으로 매개변수화되었기 때문에 역전파할 필요가 없습니다.)
tf.placeholder를 사용하여 궁극적으로 TF 그래프를 실행할 때 이전 반복의 임의 MCMC 샘플을 다음 반복의 체인 값으로 제공할 수 있도록 합니다.
TFP의 적응 step_size 휴리스틱인 tfp.mcmc.hmc_step_size_update_fn을 사용합니다.

In [ ]:

# Set-up E-step.

step_size = tf.get_variable(
    'step_size',
    initializer=np.array(0.2, dtype=dtype),
    trainable=False)

hmc = tfp.mcmc.HamiltonianMonteCarlo(
    target_log_prob_fn=unnormalized_posterior_log_prob,
    num_leapfrog_steps=2,
    step_size=step_size,
    step_size_update_fn=tfp.mcmc.make_simple_step_size_update_policy(
      num_adaptation_steps=None),
    state_gradients_are_stopped=True)

init_random_weights = tf.placeholder(dtype, shape=[len(log_county_uranium_ppm)])

posterior_random_weights, kernel_results = tfp.mcmc.sample_chain(
    num_results=3,
    num_burnin_steps=0,
    num_steps_between_results=0,
    current_state=init_random_weights,
    kernel=hmc)

이제 M-단계를 설정합니다. 이것은 본질적으로 TF에서 수행할 수 있는 최적화와 동일합니다.

In [ ]:

# Set-up M-step.

loss = -tf.reduce_mean(kernel_results.accepted_results.target_log_prob)

global_step = tf.train.get_or_create_global_step()

learning_rate = tf.train.exponential_decay(
    learning_rate=0.1,
    global_step=global_step,
    decay_steps=2,
    decay_rate=0.99)

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss, global_step=global_step)

몇 가지 정리 작업으로 마무리합니다. 모든 변수가 초기화되었음을 TF에 알려야 합니다. 또한 프로시저의 각 반복에서 해당 값을 print할 수 있도록 TF 변수에 대한 핸들을 생성합니다.

In [ ]:

# Initialize all variables.

init_op = tf.initialize_all_variables()

In [ ]:

# Grab variable handles for diagnostic purposes.

with tf.variable_scope('make_weights_prior', reuse=True):
  prior_scale = fwd_scale_transform(tf.get_variable(
      name='raw_prior_scale', dtype=dtype))
  
with tf.variable_scope('make_log_radon_likelihood', reuse=True):
  likelihood_scale = fwd_scale_transform(tf.get_variable(
      name='raw_likelihood_scale', dtype=dtype))
  fixed_effect_weights = tf.get_variable(
      name='fixed_effect_weights', dtype=dtype)

6.3 실행

이 섹션에서는 SAEM TF 그래프를 실행합니다. 여기서 주된 개념은 HMC 커널의 마지막 드로우를 다음 반복에 제공하는 것입니다. 이를 위해 sess.run 호출에서 feed_dict를 사용합니다.

In [ ]:

init_op.run()
w_ = np.zeros([len(log_county_uranium_ppm)], dtype=dtype)

In [ ]:

%%time
maxiter = int(1500)
num_accepted = 0
num_drawn = 0
for i in range(maxiter):
  [
      _,
      global_step_,
      loss_,
      posterior_random_weights_,
      kernel_results_,
      step_size_,
      prior_scale_,
      likelihood_scale_,
      fixed_effect_weights_,
  ] = sess.run([
      train_op,
      global_step,
      loss,
      posterior_random_weights,
      kernel_results,
      step_size,
      prior_scale,
      likelihood_scale,
      fixed_effect_weights,
  ], feed_dict={init_random_weights: w_})
  w_ = posterior_random_weights_[-1, :]
  num_accepted += kernel_results_.is_accepted.sum()
  num_drawn += kernel_results_.is_accepted.size
  acceptance_rate = num_accepted / num_drawn
  if i % 100 == 0 or i == maxiter - 1:
    print('global_step:{:>4}  loss:{: 9.3f}  acceptance:{:.4f}  '
          'step_size:{:.4f}  prior_scale:{:.4f}  likelihood_scale:{:.4f}  '
          'fixed_effect_weights:{}'.format(
              global_step_, loss_.mean(), acceptance_rate, step_size_,
              prior_scale_, likelihood_scale_, fixed_effect_weights_))

global_step:   0  loss: 1966.948  acceptance:1.0000  step_size:0.2000  prior_scale:1.0000  likelihood_scale:0.8529  fixed_effect_weights:[ 0.  1.]
global_step: 100  loss: 1165.385  acceptance:0.6205  step_size:0.2040  prior_scale:0.9568  likelihood_scale:0.7611  fixed_effect_weights:[ 1.47523439 -0.66043079]
global_step: 200  loss: 1149.851  acceptance:0.6766  step_size:0.2081  prior_scale:0.7465  likelihood_scale:0.7665  fixed_effect_weights:[ 1.48918796 -0.67058587]
global_step: 300  loss: 1163.464  acceptance:0.6811  step_size:0.2040  prior_scale:0.8445  likelihood_scale:0.7594  fixed_effect_weights:[ 1.46291411 -0.67586178]
global_step: 400  loss: 1158.846  acceptance:0.6808  step_size:0.2081  prior_scale:0.8377  likelihood_scale:0.7574  fixed_effect_weights:[ 1.47349834 -0.68823022]
global_step: 500  loss: 1154.193  acceptance:0.6766  step_size:0.1961  prior_scale:0.8546  likelihood_scale:0.7564  fixed_effect_weights:[ 1.47703862 -0.67521363]
global_step: 600  loss: 1163.903  acceptance:0.6783  step_size:0.2040  prior_scale:0.9491  likelihood_scale:0.7587  fixed_effect_weights:[ 1.48268366 -0.69667786]
global_step: 700  loss: 1163.894  acceptance:0.6767  step_size:0.1961  prior_scale:0.8644  likelihood_scale:0.7617  fixed_effect_weights:[ 1.4719094  -0.66897118]
global_step: 800  loss: 1153.689  acceptance:0.6742  step_size:0.2123  prior_scale:0.8366  likelihood_scale:0.7609  fixed_effect_weights:[ 1.47345769 -0.68343043]
global_step: 900  loss: 1155.312  acceptance:0.6718  step_size:0.2040  prior_scale:0.8633  likelihood_scale:0.7581  fixed_effect_weights:[ 1.47426116 -0.6748783 ]
global_step:1000  loss: 1151.278  acceptance:0.6690  step_size:0.2081  prior_scale:0.8737  likelihood_scale:0.7581  fixed_effect_weights:[ 1.46990883 -0.68891817]
global_step:1100  loss: 1156.858  acceptance:0.6676  step_size:0.2040  prior_scale:0.8716  likelihood_scale:0.7584  fixed_effect_weights:[ 1.47386014 -0.6796245 ]
global_step:1200  loss: 1166.247  acceptance:0.6653  step_size:0.2000  prior_scale:0.8748  likelihood_scale:0.7588  fixed_effect_weights:[ 1.47389269 -0.67626756]
global_step:1300  loss: 1165.263  acceptance:0.6636  step_size:0.2040  prior_scale:0.8771  likelihood_scale:0.7581  fixed_effect_weights:[ 1.47612262 -0.67752427]
global_step:1400  loss: 1158.108  acceptance:0.6640  step_size:0.2040  prior_scale:0.8748  likelihood_scale:0.7587  fixed_effect_weights:[ 1.47534692 -0.6789524 ]
global_step:1499  loss: 1161.030  acceptance:0.6638  step_size:0.1941  prior_scale:0.8738  likelihood_scale:0.7589  fixed_effect_weights:[ 1.47624075 -0.67875224]
CPU times: user 1min 16s, sys: 17.6 s, total: 1min 33s
Wall time: 27.9 s

~1500단계 후에 매개변수 추정치가 안정화된 것 같습니다.

6.4 결과

이제 매개변수를 피팅했으므로 많은 수의 사후 샘플을 생성하고 결과를 연구해 보겠습니다.

In [ ]:

%%time
posterior_random_weights_final, kernel_results_final = tfp.mcmc.sample_chain(
    num_results=int(15e3),
    num_burnin_steps=int(1e3),
    current_state=init_random_weights,
    kernel=tfp.mcmc.HamiltonianMonteCarlo(
      target_log_prob_fn=unnormalized_posterior_log_prob,
      num_leapfrog_steps=2,
      step_size=step_size))

[
    posterior_random_weights_final_,
    kernel_results_final_,
] = sess.run([
    posterior_random_weights_final,
    kernel_results_final,
], feed_dict={init_random_weights: w_})

CPU times: user 1min 42s, sys: 26.6 s, total: 2min 8s
Wall time: 35.1 s

In [ ]:

print('prior_scale: ', prior_scale_)
print('likelihood_scale: ', likelihood_scale_)
print('fixed_effect_weights: ', fixed_effect_weights_)
print('acceptance rate final: ', kernel_results_final_.is_accepted.mean())

prior_scale:  0.873799
likelihood_scale:  0.758913
fixed_effect_weights:  [ 1.47624075 -0.67875224]
acceptance rate final:  0.7448

이제 $\beta_c \log(\text{UraniumPPM}_c)$ 무작위 효과의 상자 및 수염 다이어그램을 구성합니다. 카운티 빈도를 줄여 무작위 효과를 명령합니다.

In [ ]:

x = posterior_random_weights_final_ * log_county_uranium_ppm
I = county_freq[:, 0]
x = x[:, I]
cols = np.array(county_name)[I]
pw = pd.DataFrame(x)
pw.columns = cols

fig, ax = plt.subplots(figsize=(25, 4))
ax = pw.boxplot(rot=80, vert=True);

이 상자 및 수염 다이어그램에서 카운티 수준 $\log(\text{UraniumPPM})$ 무작위 효과의 분산이 데이터세트에서 카운티가 덜 대표될수록 증가하는 것이 관찰됩니다. 직관적으로 이것은 이치에 맞습니다. 즉, 특정 카운티에 대한 증거가 적다면 이 카운티의 영향에 대해 덜 확신해야 합니다.

7 나란히 비교

이제 세 가지 프로시저의 결과를 모두 비교합니다. 이를 위해 Stan과 TFP에 의해 생성된 사후 샘플의 비 파라메트릭 추정치를 계산합니다. 또한 R의 lme4 패키지에 의해 생성된 파라메트릭(근사치) 추정치와 비교할 것입니다.

다음 플롯은 미네소타의 각 카운티에 대한 각 가중치의 사후 분포를 보여줍니다. Stan(빨간색), TFP(파란색) 및 R의 lme4(주황색)에 대한 결과를 보여줍니다. Stan과 TFP의 결과를 음영 처리하므로 이 둘이 합치되면 보라색이 나타날 것으로 예상할 수 있습니다. 단순화를 위해 R의 결과는 음영 처리하지 않습니다. 각 서브 플롯은 단일 카운티를 나타내며 래스터 스캔 순서(즉, 왼쪽에서 오른쪽, 그리고 위에서 아래)의 내림차순 빈도로 정렬됩니다.

In [ ]:

nrows = 17
ncols = 5
fig, ax = plt.subplots(nrows, ncols, figsize=(18, 21), sharey=True, sharex=True)
with warnings.catch_warnings():
  warnings.simplefilter('ignore')
  ii = -1
  for r in range(nrows):
    for c in range(ncols):
      ii += 1
      idx = county_freq[ii, 0]
      sns.kdeplot(
          posterior_random_weights_final_[:, idx] * log_county_uranium_ppm[idx],
          color='blue',
          alpha=.3,
          shade=True,
          label='TFP',
          ax=ax[r][c])
      sns.kdeplot(
          posterior_random_weights_final_stan[:, idx] *
          log_county_uranium_ppm[idx],
          color='red',
          alpha=.3,
          shade=True,
          label='Stan/rstanarm',
          ax=ax[r][c])
      sns.kdeplot(
          posterior_random_weights_lme4_final_[:, idx] *
          log_county_uranium_ppm[idx],
          color='#F4B400',
          alpha=.7,
          shade=False,
          label='R/lme4',
          ax=ax[r][c])
      ax[r][c].vlines(
          posterior_random_weights_lme4[idx] * log_county_uranium_ppm[idx],
          0,
          5,
          color='#F4B400',
          linestyle='--')
      ax[r][c].set_title(county_name[idx] + ' ({})'.format(idx), y=.7)
      ax[r][c].set_ylim(0, 5)
      ax[r][c].set_xlim(-1., 1.)
      ax[r][c].get_yaxis().set_visible(False)
      if ii == 2:
        ax[r][c].legend(bbox_to_anchor=(1.4, 1.7), fontsize=20, ncol=3)
      else:
        ax[r][c].legend_.remove()
  fig.subplots_adjust(wspace=0.03, hspace=0.1)

8 결론

이 colab에서는 선형 혼합 효과 회귀 모델을 radon 데이터세트에 피팅했습니다. R, Stan 및 TensorFlow Probability의 세 가지 소프트웨어 패키지를 시도했습니다. 세 가지 다른 소프트웨어 패키지에 의해 계산된 85개의 사후 분포를 플로팅하여 결론을 내렸습니다.

부록 A: 대체 Radon HLM(무작위 가로채기 추가)

이 섹션에서는 각 카운티와 관련된 무작위 가로채기도 가지고 있는 대체 HLM에 대해 설명합니다.

\begin{align*} \text{for } & c=1\ldots \text{NumCounties}:\\ & \beta_c \sim \text{MultivariateNormal}\left(\text{loc}=\left[ \begin{array}{c} 0 \\ 0 \end{array}\right] , \text{scale}=\left[\begin{array}{cc} \sigma_{11} & 0 \\ \sigma_{12} & \sigma_{22} \end{array}\right] \right) \\ \text{for } & i=1\ldots \text{NumSamples}:\\ & c_i := \text{County}_i \\ &\eta_i = \underbrace{\omega_0 + \omega_1\text{Floor}_i \vphantom{\log( \text{CountyUraniumPPM}_{c_i}))}}_{\text{fixed effects}} + \underbrace{\beta_{c_i,0} + \beta_{c_i,1}\log( \text{CountyUraniumPPM}_{c_i}))}_{\text{random effects}} \\ &\log(\text{Radon}_i) \sim \text{Normal}(\text{loc}=\eta_i , \text{scale}=\sigma) \end{align*}

R의 lme4 "틸데 표기법"에서 이 모델은 다음과 같습니다.

log_radon ~ 1 + floor + (1 + log_county_uranium_ppm | county)

부록 B: 일반화된 선형 혼합 효과 모델

이 섹션에서는 본문에서 사용되는 것보다 더 일반적인 계층적 선형 모델의 특성을 제공합니다. 이 보다 일반적인 모델을 일반화된 선형 혼합 효과 모델(GLMM)이라고 합니다.

GLMM은 일반화된 선형 모델(GLM)을 일반화한 것입니다. GLMM은 샘플에 특정한 무작위 노이즈를 예측된 선형 응답에 도입하여 GLM을 확장합니다. 이것은 거의 드러나지 않는 기능이 더 일반적으로 드러나는 기능과 정보를 공유할 수 있도록 한다는 점에서 일부 유용한 면이 있습니다.

생성 프로세스로서 일반화된 선형 혼합 효과 모델(GLMM)은 다음과 같은 특징이 있습니다.

ParseError: KaTeX parse error: Expected 'EOF', got '#' at position 70: …e{2.45cm}\text{#̲ for each rando…

여기서:

\begin{align} R &amp;= \text{무작위 효과 그룹의 수}\ |C_r| &amp;= \text{그룹 $r$의 범주 수}\ N &amp;= \text{훈련 샘플 수}\ x_i,\omega &amp;\in \mathbb{R}^{D_0}\ D_0 &amp;= \text{고정 효과의 수}\ C_r(i) &amp;= \text{$i$th 샘플의 범주(그룹 $r$ 아래)}\ z_{r,i} &amp;\in \mathbb{R}^{D_r }\ D_r &amp;= \text{그룹 $r$와 연관된 무작위 효과의 수}\ \Sigma_{r} &amp;\in {S\in\mathbb{R}^{D_r \times D_r} : S \succ 0 }\ \eta_i\mapsto g^{-1}(\eta_i) &amp;= \mu_i, \text{역링크 함수}\ \text{Distribution} &amp;=\text{평균에 의해서만 파라미터화할 수 있는 일부 분포} \end{align}

즉, 각 그룹의 모든 범주는 iid MVN, $\beta_{rc}$ 와 연결되어 있습니다. $\beta_{rc}$ 추첨이 항상 독립적이지만 $r$ 그룹에 대해서만 동일하게 배포됩니다. 각 $r\in{1,\ldots,R}$ 에 대해 정확히 하나의 $\Sigma_r$ 가 있음에 주목하세요.

샘플 그룹의 특성, $z_{r,i}$ 과 유사하게 결합하면 결과는 $i$ 번째 예측 선형 응답에 대한 샘플별 노이즈입니다(그렇지 않으면 $x_i^\top\omega$ ).

$\{\Sigma_r:r\in\{1,\ldots,R\}\}$ 를 추정할 때 본질적으로 무작위 효과 그룹이 전달하는 노이즈의 양을 추정합니다. 그렇지 않으면 $x_i^\top\omega에 있는 신호를 추출합니다.

$\text{Distribution}$ , 역링크 함수 및 $g^{-1}$ 에 대한 다양한 옵션이 있습니다. 일반적인 옵션은 다음과 같습니다.

$Y_i\sim\text{Normal}(\text{mean}=\eta_i, \text{scale}=\sigma)$ ,
ParseError: KaTeX parse error: Expected 'EOF', got '_' at position 81: …i), \text{total_̲count}=n_i), 및
$Y_i\sim\text{Poisson}(\text{mean}=\exp(\eta_i))$ .

더 많은 가능성은 tfp.glm 모듈을 참조하세요.