GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ko/probability/examples/Probabilistic_PCA.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2018 The TensorFlow Probability Authors.

Licensed under the Apache License, Version 2.0 (the "License");

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License"); { display-mode: "form" }
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

확률적 PCA

확률적 PCA(principal components analysis; 주성분 분석)은 더 낮은 차원의 잠재 공간을 통해 데이터를 분석하는 차원 감소 기법입니다(Tipping and Bishop 1999). 이는 데이터에 누락 값이 있을 때 혹은 다차원 척도법에 사용됩니다.

가져오기

In [ ]:

import functools
import warnings

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

import tensorflow.compat.v2 as tf
import tensorflow_probability as tfp

from tensorflow_probability import bijectors as tfb
from tensorflow_probability import distributions as tfd

tf.enable_v2_behavior()

plt.style.use("ggplot")
warnings.filterwarnings('ignore')

모델

$N$ 데이터 포인트가 있는 $\mathbf{X} = {\mathbf{x}_n}$ 데이터세트를 가정해봅니다. 여기서 각 데이터 포인트는 $D$ 차원 $\mathbf{x}_n \in \mathbb{R}^D$ 입니다. 우리는 잠재 변수 $\mathbf{z}_n \in \mathbb{R}^K$ 아래의 각 $\mathbf{x}_n$ 를 ParseError: KaTeX parse error: Expected 'EOF', got '&' at position 3: K &̲lt; D와 같이 더 낮은 차원으로 표현하는 것을 목표로 합니다. 주축 세트 $\mathbf{W}$ 는 잠재 변수를 데이터와 연결합니다.

구체적으로, 각 잠재 변수가 정규 분포를 따른다는 가정을 합니다.

\begin{equation*} \mathbf{z}_n \sim N(\mathbf{0}, \mathbf{I}). \end{equation*}

해당 데이터 포인트는 투영을 통해 생성됩니다.

\begin{equation*} \mathbf{x}_n \mid \mathbf{z}_n \sim N(\mathbf{W}\mathbf{z}_n, \sigma^2\mathbf{I}), \end{equation*}

여기서 $\mathbf{W}\in\mathbb{R}^{D\times K}$ 행렬은 주축으로 알려져 있습니다. 확률적 PCA에서는 일반적으로 주축 $\mathbf{W}$ 와 노이즈 항 $\sigma^2$ dmf를 추정하는 것이 관심사입니다.

확률적 PCA는 클래식한 PCA를 일반화합니다. 잠재 변수를 주변화하면 각 데이터 포인트의 분포는 다음과 같아집니다.

\begin{equation*} \mathbf{x}_n \sim N(\mathbf{0}, \mathbf{W}\mathbf{W}^\top + \sigma^2\mathbf{I}). \end{equation*}

클래식한 PCA는 노이즈의 공분산이 $\sigma^2 \to 0$ 와 같이 극도로 작아지게 되는 확률적 PCA의 특이한 경우를 일컫습니다.

아래와 같이 모델을 설정했습니다. 해당 분석에서는 $\sigma$ 가 알려져 있다고 가정하며, $\mathbf{W}$ 를 모델 매개변수로 추정하는 대신 주축에 대한 분포를 추론하기 위해 이를 우선순위로 지정합니다. 해당 모델은 TFP JointDistribution 로 표현할 것이며 구체적으로 우리는 JointDistributionCoroutineAutoBatched를 사용할 것입니다.

In [ ]:

def probabilistic_pca(data_dim, latent_dim, num_datapoints, stddv_datapoints):
  w = yield tfd.Normal(loc=tf.zeros([data_dim, latent_dim]),
                 scale=2.0 * tf.ones([data_dim, latent_dim]),
                 name="w")
  z = yield tfd.Normal(loc=tf.zeros([latent_dim, num_datapoints]),
                 scale=tf.ones([latent_dim, num_datapoints]),
                 name="z")
  x = yield tfd.Normal(loc=tf.matmul(w, z),
                       scale=stddv_datapoints,
                       name="x")

In [ ]:

num_datapoints = 5000
data_dim = 2
latent_dim = 1
stddv_datapoints = 0.5

concrete_ppca_model = functools.partial(probabilistic_pca,
    data_dim=data_dim,
    latent_dim=latent_dim,
    num_datapoints=num_datapoints,
    stddv_datapoints=stddv_datapoints)

model = tfd.JointDistributionCoroutineAutoBatched(concrete_ppca_model)

데이터

결합 사전 분포에서 샘플링하여 데이터를 생성하는 모델을 사용할 수 있습니다.

In [ ]:

actual_w, actual_z, x_train = model.sample()

print("Principal axes:")
print(actual_w)

Principal axes:
tf.Tensor(
[[ 2.2801023]
 [-1.1619819]], shape=(2, 1), dtype=float32)

데이터세트를 시각화합니다.

In [ ]:

plt.scatter(x_train[0, :], x_train[1, :], color='blue', alpha=0.1)
plt.axis([-20, 20, -20, 20])
plt.title("Data set")
plt.show()

최대사후확률추론

먼저 사후 확률 밀도를 최대화하는 잠재 변수의 점추정치를 검색합니다. 이러한 방식은 최대사후확률(MAP) 추론으로도 알려져 있으며, 사후확률 밀도 $p(\mathbf{W}, \mathbf{Z} \mid \mathbf{X}) \propto p(\mathbf{W}, \mathbf{Z}, \mathbf{X})$ 를 최대화하는 $\mathbf{W}$ and $\mathbf{Z}$ 값을 계산함으로써 완성됩니다.

In [ ]:

w = tf.Variable(tf.random.normal([data_dim, latent_dim]))
z = tf.Variable(tf.random.normal([latent_dim, num_datapoints]))

target_log_prob_fn = lambda w, z: model.log_prob((w, z, x_train))
losses = tfp.math.minimize(
    lambda: -target_log_prob_fn(w, z),
    optimizer=tf.optimizers.Adam(learning_rate=0.05),
    num_steps=200)

In [ ]:

plt.plot(losses)

[<matplotlib.lines.Line2D at 0x7f19897a42e8>]

우리는 $\mathbf{W}$ 및 $\mathbf{Z}$ 에 대해 추론된 값의 데이터를 샘플링하고, 이를 조건화한 실제 데이터세트와 비교할 수 있습니다.

In [ ]:

print("MAP-estimated axes:")
print(w)

_, _, x_generated = model.sample(value=(w, z, None))

plt.scatter(x_train[0, :], x_train[1, :], color='blue', alpha=0.1, label='Actual data')
plt.scatter(x_generated[0, :], x_generated[1, :], color='red', alpha=0.1, label='Simulated data (MAP)')
plt.legend()
plt.axis([-20, 20, -20, 20])
plt.show()

MAP-estimated axes:
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float32, numpy=
array([[ 2.9135954],
       [-1.4826864]], dtype=float32)>

변분 추론

MAP는 사후 확률 분포 모드 검색에 사용할 수 있지만 이에 대한 다른 통찰력은 제공하지 않습니다. 그래서 우리는 다음으로 변분추론을 사용합니다. 여기서 사후 확률 분포 $p(\mathbf{W}, \mathbf{Z} \mid \mathbf{X})$ 는 변수분포 $q(\mathbf{W}, \mathbf{Z})$ 를 사용하여 근사치를 계산하며, $\boldsymbol{\lambda}$ 를 통해 매개변수를 표시합니다. 목표는 q와 사후 확률 $\mathrm{KL}(q(\mathbf{W}, \mathbf{Z}) \mid\mid p(\mathbf{W}, \mathbf{Z} \mid \mathbf{X}))$ 사이의 또는 이에 해당하는 KL 발산을 <em data-md-type="emphasis">최소화</em>하고, ELBO(evidence lower bound) $\mathbb{E}_{q(\mathbf{W},\mathbf{Z};\boldsymbol{\lambda})}\left[ \log p(\mathbf{W},\mathbf{Z},\mathbf{X}) - \log q(\mathbf{W},\mathbf{Z}; \boldsymbol{\lambda}) \right]$ 를 <em data-md-type="emphasis">최대화</em>하는 변분매개변수 $\boldsymbol{\lambda}$ 를 찾는 것입니다.

In [ ]:

qw_mean = tf.Variable(tf.random.normal([data_dim, latent_dim]))
qz_mean = tf.Variable(tf.random.normal([latent_dim, num_datapoints]))
qw_stddv = tfp.util.TransformedVariable(1e-4 * tf.ones([data_dim, latent_dim]),
                                        bijector=tfb.Softplus())
qz_stddv = tfp.util.TransformedVariable(
    1e-4 * tf.ones([latent_dim, num_datapoints]),
    bijector=tfb.Softplus())
def factored_normal_variational_model():
  qw = yield tfd.Normal(loc=qw_mean, scale=qw_stddv, name="qw")
  qz = yield tfd.Normal(loc=qz_mean, scale=qz_stddv, name="qz")

surrogate_posterior = tfd.JointDistributionCoroutineAutoBatched(
    factored_normal_variational_model)

losses = tfp.vi.fit_surrogate_posterior(
    target_log_prob_fn,
    surrogate_posterior=surrogate_posterior,
    optimizer=tf.optimizers.Adam(learning_rate=0.05),
    num_steps=200)

In [ ]:

print("Inferred axes:")
print(qw_mean)
print("Standard Deviation:")
print(qw_stddv)

plt.plot(losses)
plt.show()

Inferred axes:
<tf.Variable 'Variable:0' shape=(2, 1) dtype=float32, numpy=
array([[ 2.4168603],
       [-1.2236133]], dtype=float32)>
Standard Deviation:
<TransformedVariable: dtype=float32, shape=[2, 1], fn="softplus", numpy=
array([[0.0042499 ],
       [0.00598824]], dtype=float32)>

In [ ]:

posterior_samples = surrogate_posterior.sample(50)
_, _, x_generated = model.sample(value=(posterior_samples))

# It's a pain to plot all 5000 points for each of our 50 posterior samples, so
# let's subsample to get the gist of the distribution.
x_generated = tf.reshape(tf.transpose(x_generated, [1, 0, 2]), (2, -1))[:, ::47]

plt.scatter(x_train[0, :], x_train[1, :], color='blue', alpha=0.1, label='Actual data')
plt.scatter(x_generated[0, :], x_generated[1, :], color='red', alpha=0.1, label='Simulated data (VI)')
plt.legend()
plt.axis([-20, 20, -20, 20])
plt.show()

감사의 말

이 튜토리얼은 원래 Edward 1.0(소스)로 작성되었습니다. 해당 버전을 작성하고 수정하는데 기여해 주신 모든 분께 감사드립니다.

참고 자료

[1]: Michael E. Tipping and Christopher M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3): 611-622, 1999.