GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ko/probability/examples/Fitting_DPMM_Using_pSGLD.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2018 The TensorFlow Probability Authors.

Licensed under the Apache License, Version 2.0 (the "License");

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License"); { display-mode: "form" }
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

사전 조정된 확률적 그래디언트 랑주뱅 동역학으로 디리클레 프로세스 혼합 모델 맞춤 조정하기

이 노트북에서는 가우시안 분포의 디리클레 프로세스(Dirichlet Process)를 맞춤 조정하여 많은 수의 샘플을 클러스터링하고 동시에 클러스터 수를 추론하는 방법을 보여줍니다. 추론을 위해 사전 조정된 확률적 그래디언트 랑주뱅 동역학(Preconditioned Stochastic Gradient Langevin Dynamics, pSGLD)을 사용합니다.

1. 샘플

먼저 장난감 데이터세트를 설정합니다. 3개의 이변량 가우시안 분포에서 50,000개의 무작위 샘플을 생성합니다.

In [ ]:

import time
import numpy as np
import matplotlib.pyplot as plt
import tensorflow.compat.v1 as tf
import tensorflow_probability as tfp

In [ ]:

plt.style.use('ggplot')
tfd = tfp.distributions

In [ ]:

def session_options(enable_gpu_ram_resizing=True):
  """Convenience function which sets common `tf.Session` options."""
  config = tf.ConfigProto()
  config.log_device_placement = True
  if enable_gpu_ram_resizing:
    # `allow_growth=True` makes it possible to connect multiple colabs to your
    # GPU. Otherwise the colab malloc's all GPU ram.
    config.gpu_options.allow_growth = True
  return config

def reset_sess(config=None):
  """Convenience function to create the TF graph and session, or reset them."""
  if config is None:
    config = session_options()
  tf.reset_default_graph()
  global sess
  try:
    sess.close()
  except:
    pass
  sess = tf.InteractiveSession(config=config)

In [ ]:

# For reproducibility
rng = np.random.RandomState(seed=45)
tf.set_random_seed(76)

# Precision
dtype = np.float64

# Number of training samples
num_samples = 50000

# Ground truth loc values which we will infer later on. The scale is 1.
true_loc = np.array([[-4, -4],
                     [0, 0],
                     [4, 4]], dtype)

true_components_num, dims = true_loc.shape

# Generate training samples from ground truth loc
true_hidden_component = rng.randint(0, true_components_num, num_samples)
observations = (true_loc[true_hidden_component]
                + rng.randn(num_samples, dims).astype(dtype))

In [ ]:

# Visualize samples
plt.scatter(observations[:, 0], observations[:, 1], 1)
plt.axis([-10, 10, -10, 10])
plt.show()

2. 모델

여기에서 대칭 디리클레 사전 확률을 사용하여 가우시안 분포의 디리클레 프로세스 혼합을 정의합니다. 노트북 전체에서 벡터 수량은 굵게 표시됩니다. $i\in{1,\ldots,N}$ 샘플에 대해 $j \in{1,\ldots,K}$ 가우시안 분포의 혼합이 있는 모델은 다음과 같이 공식화됩니다.

\begin{align*} p(\boldsymbol{x}*1,\cdots, \boldsymbol{x}*N) &amp;=\prod*{i=1}^N \text{GMM}(x_i), \ &amp;,\quad \text{with};\text{GMM}(x_i)=\sum*{j=1}^K\pi_j\text{Normal}(x_i,|,\text{loc}=\boldsymbol{\mu_{j}},,\text{scale}=\boldsymbol{\sigma_{j}})\ \end{align*}

\begin{align*} x_i&amp;\sim \text{Normal}(\text{loc}=\boldsymbol{\mu}*{z_i},,\text{scale}=\boldsymbol{\sigma}*{z_i}) \ z_i &amp;= \text{Categorical}(\text{prob}=\boldsymbol{\pi}),\ &amp;,\quad \text{with};\boldsymbol{\pi}={\pi_1,\cdots,\pi_K}\ \boldsymbol{\pi}&amp;\sim\text{Dirichlet}(\text{concentration}={\frac{\alpha}{K},\cdots,\frac{\alpha}{K}})\ \alpha&amp;\sim \text{InverseGamma}(\text{concentration}=1,,\text{rate}=1)\ \boldsymbol{\mu_j} &amp;\sim \text{Normal}(\text{loc}=\boldsymbol{0}, ,\text{scale}=\boldsymbol{1})\ \boldsymbol{\sigma_j} &amp;\sim \text{InverseGamma}(\text{concentration}=\boldsymbol{1},,\text{rate}=\boldsymbol{1})\ \end{align*}

클러스터의 추론된 인덱스를 나타내는 $z_i$ 를 통해 $j$ 번째 클러스터에 각 $x_i$ 를 할당하는 것이 목표입니다.

이상적인 디리클레 혼합 모델의 경우 $K$ 는 $\infty$ 로 설정됩니다. 하지만 충분히 큰 $K$ 로 디리클레 혼합 모델을 근사화할 수 있는 것으로 알려져 있습니다. 초기값을 $K$ 로 임의로 설정했지만, 단순한 가우시안 혼합 모델과 달리 최적화를 통해 최적의 클러스터 수를 추론할 수도 있습니다.

이 노트북에서는 이변량 가우시안 분포를 혼합 구성 요소로 사용하고 $K$ 를 30으로 설정합니다.

In [ ]:

reset_sess()

# Upperbound on K
max_cluster_num = 30

# Define trainable variables.
mix_probs = tf.nn.softmax(
    tf.Variable(
        name='mix_probs',
        initial_value=np.ones([max_cluster_num], dtype) / max_cluster_num))

loc = tf.Variable(
    name='loc',
    initial_value=np.random.uniform(
        low=-9, #set around minimum value of sample value
        high=9, #set around maximum value of sample value
        size=[max_cluster_num, dims]))

precision = tf.nn.softplus(tf.Variable(
    name='precision',
    initial_value=
    np.ones([max_cluster_num, dims], dtype=dtype)))

alpha = tf.nn.softplus(tf.Variable(
    name='alpha',
    initial_value=
    np.ones([1], dtype=dtype)))

training_vals = [mix_probs, alpha, loc, precision]


# Prior distributions of the training variables

#Use symmetric Dirichlet prior as finite approximation of Dirichlet process.
rv_symmetric_dirichlet_process = tfd.Dirichlet(
    concentration=np.ones(max_cluster_num, dtype) * alpha / max_cluster_num,
    name='rv_sdp')

rv_loc = tfd.Independent(
    tfd.Normal(
        loc=tf.zeros([max_cluster_num, dims], dtype=dtype),
        scale=tf.ones([max_cluster_num, dims], dtype=dtype)),
    reinterpreted_batch_ndims=1,
    name='rv_loc')


rv_precision = tfd.Independent(
    tfd.InverseGamma(
        concentration=np.ones([max_cluster_num, dims], dtype),
        rate=np.ones([max_cluster_num, dims], dtype)),
    reinterpreted_batch_ndims=1,
    name='rv_precision')

rv_alpha = tfd.InverseGamma(
    concentration=np.ones([1], dtype=dtype),
    rate=np.ones([1]),
    name='rv_alpha')

# Define mixture model
rv_observations = tfd.MixtureSameFamily(
    mixture_distribution=tfd.Categorical(probs=mix_probs),
    components_distribution=tfd.MultivariateNormalDiag(
        loc=loc,
        scale_diag=precision))

3. 최적화

사전 조정된 확률적 그래디언트 랑주뱅 동역학(pSGLD)으로 모델을 최적화하여, 미니 배치 경사 하강 방식으로 많은 수의 샘플에서 모델을 최적화할 수 있습니다.

$t,$ 번째 반복에서 매개변수 $\boldsymbol{\theta}\equiv{\boldsymbol{\pi},,\alpha,, \boldsymbol{\mu_j},,\boldsymbol{\sigma_j}}$ 를 미니 배치 크기 $M$ 로 업데이트하려면, 업데이트는 다음과 같이 샘플링됩니다.

$$\begin{align*} \Delta \boldsymbol { \theta } _ { t } & \sim \frac { \epsilon _ { t } } { 2 } \bigl[ G \left( \boldsymbol { \theta } _ { t } \right) \bigl( \nabla _ { \boldsymbol { \theta } } \log p \left( \boldsymbol { \theta } _ { t } \right) - \frac { N } { M } \sum _ { k = 1 } ^ { M } \nabla _ \boldsymbol { \theta } \log \text{GMM}(x_{t_k})\bigr) + \sum_\boldsymbol{\theta}\nabla_\theta G \left( \boldsymbol { \theta } _ { t } \right) \bigr]\ &+ G ^ { \frac { 1 } { 2 } } \left( \boldsymbol { \theta } _ { t } \right) \text { Normal } \left( \text{loc}=\boldsymbol{0} ,, \text{scale}=\epsilon _ { t }\boldsymbol{1} \right)\ \end{align*}$$

위의 수식에서 $\epsilon _ { t }$ 는 $t,$ 번째 반복에서의 학습률이고, $\log p(\theta_t)$ 는 $\theta$ 의 로그 사전 확률 분포의 합계입니다. $G ( \boldsymbol { \theta } _ { t })$ 는 각 매개변수의 그래디언트 배율을 조정하는 사전 조정기입니다.

In [ ]:

# Learning rates and decay
starter_learning_rate = 1e-6
end_learning_rate = 1e-10
decay_steps = 1e4

# Number of training steps
training_steps = 10000

# Mini-batch size
batch_size = 20

# Sample size for parameter posteriors
sample_size = 100

pSGLD에 대한 손실 함수로 가능성 $\text{GMM}(x_{t_k})$ 및 사전 확률 $p(\theta_t)$ 의 결합 로그 확률을 사용합니다.

pSGLD의 API에 지정된 대로 사전 확률의 합계를 샘플 크기 $N$ 으로 나누어야 합니다.

In [ ]:

# Placeholder for mini-batch
observations_tensor = tf.compat.v1.placeholder(dtype, shape=[batch_size, dims])

# Define joint log probabilities
# Notice that each prior probability should be divided by num_samples and
# likelihood is divided by batch_size for pSGLD optimization.
log_prob_parts = [
    rv_loc.log_prob(loc) / num_samples,
    rv_precision.log_prob(precision) / num_samples,
    rv_alpha.log_prob(alpha) / num_samples,
    rv_symmetric_dirichlet_process.log_prob(mix_probs)[..., tf.newaxis]
    / num_samples,
    rv_observations.log_prob(observations_tensor) / batch_size
]
joint_log_prob = tf.reduce_sum(tf.concat(log_prob_parts, axis=-1), axis=-1)

In [ ]:

# Make mini-batch generator
dx = tf.compat.v1.data.Dataset.from_tensor_slices(observations)\
  .shuffle(500).repeat().batch(batch_size)
iterator = tf.compat.v1.data.make_one_shot_iterator(dx)
next_batch = iterator.get_next()

# Define learning rate scheduling
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.polynomial_decay(
    starter_learning_rate,
    global_step, decay_steps,
    end_learning_rate, power=1.)

# Set up the optimizer. Don't forget to set data_size=num_samples.
optimizer_kernel = tfp.optimizer.StochasticGradientLangevinDynamics(
    learning_rate=learning_rate,
    preconditioner_decay_rate=0.99,
    burnin=1500,
    data_size=num_samples)

train_op = optimizer_kernel.minimize(-joint_log_prob)

# Arrays to store samples
mean_mix_probs_mtx = np.zeros([training_steps, max_cluster_num])
mean_alpha_mtx = np.zeros([training_steps, 1])
mean_loc_mtx = np.zeros([training_steps, max_cluster_num, dims])
mean_precision_mtx = np.zeros([training_steps, max_cluster_num, dims])

init = tf.global_variables_initializer()
sess.run(init)

start = time.time()
for it in range(training_steps):
  [
      mean_mix_probs_mtx[it, :],
      mean_alpha_mtx[it, 0],
      mean_loc_mtx[it, :, :],
      mean_precision_mtx[it, :, :],
      _
  ] = sess.run([
      *training_vals,
      train_op
  ], feed_dict={
      observations_tensor: sess.run(next_batch)})

elapsed_time_psgld = time.time() - start
print("Elapsed time: {} seconds".format(elapsed_time_psgld))

# Take mean over the last sample_size iterations
mean_mix_probs_ = mean_mix_probs_mtx[-sample_size:, :].mean(axis=0)
mean_alpha_ = mean_alpha_mtx[-sample_size:, :].mean(axis=0)
mean_loc_ = mean_loc_mtx[-sample_size:, :].mean(axis=0)
mean_precision_ = mean_precision_mtx[-sample_size:, :].mean(axis=0)

Elapsed time: 309.8013095855713 seconds

4. 결과 시각화하기

4.1. 클러스터링된 결과

먼저 클러스터링 결과를 시각화합니다.

각 샘플 $x_i$ 를 클러스터 $j$ 에 할당하기 위해 $z_i$ 의 사후 확률을 다음과 같이 계산합니다.

\begin{align*} j = \underset{z_i}{\arg\max},p(z_i,|,x_i,,\boldsymbol{\theta}) \end{align*}

In [ ]:

loc_for_posterior = tf.compat.v1.placeholder(
    dtype, [None, max_cluster_num, dims], name='loc_for_posterior')
precision_for_posterior = tf.compat.v1.placeholder(
    dtype, [None, max_cluster_num, dims], name='precision_for_posterior')
mix_probs_for_posterior = tf.compat.v1.placeholder(
    dtype, [None, max_cluster_num], name='mix_probs_for_posterior')

# Posterior of z (unnormalized)
unnomarlized_posterior = tfd.MultivariateNormalDiag(
    loc=loc_for_posterior, scale_diag=precision_for_posterior)\
   .log_prob(tf.expand_dims(tf.expand_dims(observations, axis=1), axis=1))\
   + tf.log(mix_probs_for_posterior[tf.newaxis, ...])

# Posterior of z (normarizad over latent states)
posterior = unnomarlized_posterior\
  - tf.reduce_logsumexp(unnomarlized_posterior, axis=-1)[..., tf.newaxis]

cluster_asgmt = sess.run(tf.argmax(
    tf.reduce_mean(posterior, axis=1), axis=1), feed_dict={
        loc_for_posterior: mean_loc_mtx[-sample_size:, :],
        precision_for_posterior: mean_precision_mtx[-sample_size:, :],
        mix_probs_for_posterior: mean_mix_probs_mtx[-sample_size:, :]})

idxs, count = np.unique(cluster_asgmt, return_counts=True)

print('Number of inferred clusters = {}\n'.format(len(count)))
np.set_printoptions(formatter={'float': '{: 0.3f}'.format})

print('Number of elements in each cluster = {}\n'.format(count))

def convert_int_elements_to_consecutive_numbers_in(array):
  unique_int_elements = np.unique(array)
  for consecutive_number, unique_int_element in enumerate(unique_int_elements):
    array[array == unique_int_element] = consecutive_number
  return array

cmap = plt.get_cmap('tab10')
plt.scatter(
    observations[:, 0], observations[:, 1],
    1,
    c=cmap(convert_int_elements_to_consecutive_numbers_in(cluster_asgmt)))
plt.axis([-10, 10, -10, 10])
plt.show()

Number of inferred clusters = 3

Number of elements in each cluster = [16911 16645 16444]

거의 동일한 수의 샘플이 적절한 클러스터에 할당되고 모델이 올바른 수의 클러스터를 성공적으로 추론한 것을 볼 수 있습니다.

4.2. 불확실성 시각화하기

여기에서는 각 샘플에 대해 시각화하여 클러스터링 결과의 불확실성을 살펴봅니다.

엔트로피를 사용하여 불확실성을 계산합니다.

\begin{align*} \text{Uncertainty}*\text{entropy} = -\frac{1}{K}\sum^{K}*{z_i=1}\sum^{O}_{l=1}p(z_i,|,x_i,,\boldsymbol{\theta}_l)\log p(z_i,|,x_i,,\boldsymbol{\theta}_l) \end{align*}

pSGLD에서는 각 반복에서 훈련 매개변수의 값을 사후 확률 분포의 샘플로 취급합니다. 따라서 각 매개변수에 대해 $O$ 반복의 값에 대한 엔트로피를 계산합니다. 최종 엔트로피 값은 모든 클러스터 할당의 엔트로피를 평균하여 계산됩니다.

In [ ]:

# Calculate entropy
posterior_in_exponential = tf.exp(posterior)
uncertainty_in_entropy = tf.reduce_mean(-tf.reduce_sum(
    posterior_in_exponential
    * posterior,
    axis=1), axis=1)

uncertainty_in_entropy_ = sess.run(uncertainty_in_entropy, feed_dict={
    loc_for_posterior: mean_loc_mtx[-sample_size:, :],
    precision_for_posterior: mean_precision_mtx[-sample_size:, :],
    mix_probs_for_posterior: mean_mix_probs_mtx[-sample_size:, :]
})

In [ ]:

plt.title('Entropy')
sc = plt.scatter(observations[:, 0],
                 observations[:, 1],
                 1,
                 c=uncertainty_in_entropy_,
                 cmap=plt.cm.viridis_r)
cbar = plt.colorbar(sc,
                    fraction=0.046,
                    pad=0.04,
                    ticks=[uncertainty_in_entropy_.min(),
                           uncertainty_in_entropy_.max()])
cbar.ax.set_yticklabels(['low', 'high'])
cbar.set_label('Uncertainty', rotation=270)
plt.show()

위 그래프에서 휘도가 낮을수록 불확실성이 커집니다. 클러스터 경계 근처의 샘플이 특히 더 높은 불확실성을 가지고 있음을 알 수 있습니다. 이는 이러한 샘플을 클러스터링하기 어려우므로 직관적으로 사실입니다.

4.3. 선택된 혼합 구성 요소의 평균 및 규모

다음으로 선택한 클러스터의 $\mu_j$ 및 $\sigma_j$ 를 확인합니다.

In [ ]:

for idx, numbe_of_samples in zip(idxs, count):
  print(
      'Component id = {}, Number of elements = {}'
      .format(idx, numbe_of_samples))
  print(
      'Mean loc = {}, Mean scale = {}\n'
      .format(mean_loc_[idx, :], mean_precision_[idx, :]))

Component id = 0, Number of elements = 16911
Mean loc = [-4.030 -4.113], Mean scale = [ 0.994  0.972]

Component id = 4, Number of elements = 16645
Mean loc = [ 3.999  4.069], Mean scale = [ 1.038  1.046]

Component id = 5, Number of elements = 16444
Mean loc = [-0.005 -0.023], Mean scale = [ 0.967  1.025]

$\boldsymbol{\mu_j}$ 및 $\boldsymbol{\sigma_j}$ 는 실측값에 가깝습니다.

4.4 각 혼합 구성 요소의 혼합 가중치

추론된 혼합 가중치도 살펴봅니다.

In [ ]:

plt.ylabel('Mean posterior of mixture weight')
plt.xlabel('Component')
plt.bar(range(0, max_cluster_num), mean_mix_probs_)
plt.show()

단지 몇 개(3개)의 혼합 구성 요소만이 상당한 가중치를 가지고 있고 나머지 가중치는 0에 가까운 값을 가지고 있음을 알 수 있습니다. 이는 또한 모델이 샘플의 분포를 구성하는 올바른 수의 혼합 구성 요소를 성공적으로 추론했음을 보여줍니다.

4.5. $\alpha$ 의 수렴

디리클레 분포의 집중 매개변수 $\alpha$ 의 수렴을 살펴봅니다.

In [ ]:

print('Value of inferred alpha = {0:.3f}\n'.format(mean_alpha_[0]))
plt.ylabel('Sample value of alpha')
plt.xlabel('Iteration')
plt.plot(mean_alpha_mtx)
plt.show()

Value of inferred alpha = 0.679

$\alpha$ 가 작으면 디리클레 혼합 모델에서 예상되는 클러스터 수가 적다는 사실을 고려할 때 모델은 반복을 통해 최적의 클러스터 수를 학습하는 것으로 보입니다.

4.6. 반복을 통해 추론된 클러스터 수

추론된 클러스터 수가 반복에 따라 어떻게 변하는지 시각화합니다.

이를 위해 반복을 통해 클러스터 수를 추론합니다.

In [ ]:

step = sample_size
num_of_iterations = 50
estimated_num_of_clusters = []
interval = (training_steps - step) // (num_of_iterations - 1)
iterations = np.asarray(range(step, training_steps+1, interval))
for iteration in iterations:
  start_position = iteration-step
  end_position = iteration

  result = sess.run(tf.argmax(
      tf.reduce_mean(posterior, axis=1), axis=1), feed_dict={
          loc_for_posterior:
              mean_loc_mtx[start_position:end_position, :],
          precision_for_posterior:
              mean_precision_mtx[start_position:end_position, :],
          mix_probs_for_posterior:
              mean_mix_probs_mtx[start_position:end_position, :]})

  idxs, count = np.unique(result, return_counts=True)
  estimated_num_of_clusters.append(len(count))

In [ ]:

plt.ylabel('Number of inferred clusters')
plt.xlabel('Iteration')
plt.yticks(np.arange(1, max(estimated_num_of_clusters) + 1, 1))
plt.plot(iterations - 1, estimated_num_of_clusters)
plt.show()

반복을 통해 클러스터 수가 3개에 가까워지고 있습니다. 반복을 통해 $\alpha$ 를 더 작은 값으로 수렴한 결과, 모델이 최적의 클러스터 수를 추론하는 매개변수를 성공적으로 학습하고 있음을 알 수 있습니다.

흥미롭게도, 추론이 훨씬 이후의 반복에서 수렴된 $\alpha$ 와 달리 초기 반복에서 올바른 수의 클러스터로 수렴되었음을 알 수 있습니다.

4.7. RMSProp로 모델 맞춤 조정하기

이 섹션에서는 pSGLD의 몬테카를로 샘플링 체계의 효과를 확인하기 위해 RMSProp를 사용하여 모델을 맞춤 조정합니다. 샘플링 방식이 없고 pSGLD가 RMSProp를 기반으로 하므로 비교를 위해 RMSProp를 선택합니다.

In [ ]:

# Learning rates and decay
starter_learning_rate_rmsprop = 1e-2
end_learning_rate_rmsprop = 1e-4
decay_steps_rmsprop = 1e4

# Number of training steps
training_steps_rmsprop = 50000

# Mini-batch size
batch_size_rmsprop = 20

In [ ]:

# Define trainable variables.
mix_probs_rmsprop = tf.nn.softmax(
    tf.Variable(
        name='mix_probs_rmsprop',
        initial_value=np.ones([max_cluster_num], dtype) / max_cluster_num))

loc_rmsprop = tf.Variable(
    name='loc_rmsprop',
    initial_value=np.zeros([max_cluster_num, dims], dtype)
    + np.random.uniform(
        low=-9, #set around minimum value of sample value
        high=9, #set around maximum value of sample value
        size=[max_cluster_num, dims]))

precision_rmsprop = tf.nn.softplus(tf.Variable(
    name='precision_rmsprop',
    initial_value=
    np.ones([max_cluster_num, dims], dtype=dtype)))

alpha_rmsprop = tf.nn.softplus(tf.Variable(
    name='alpha_rmsprop',
    initial_value=
    np.ones([1], dtype=dtype)))

training_vals_rmsprop =\
    [mix_probs_rmsprop, alpha_rmsprop, loc_rmsprop, precision_rmsprop]

# Prior distributions of the training variables

#Use symmetric Dirichlet prior as finite approximation of Dirichlet process.
rv_symmetric_dirichlet_process_rmsprop = tfd.Dirichlet(
    concentration=np.ones(max_cluster_num, dtype)
    * alpha_rmsprop / max_cluster_num,
    name='rv_sdp_rmsprop')

rv_loc_rmsprop = tfd.Independent(
    tfd.Normal(
        loc=tf.zeros([max_cluster_num, dims], dtype=dtype),
        scale=tf.ones([max_cluster_num, dims], dtype=dtype)),
    reinterpreted_batch_ndims=1,
    name='rv_loc_rmsprop')


rv_precision_rmsprop = tfd.Independent(
    tfd.InverseGamma(
        concentration=np.ones([max_cluster_num, dims], dtype),
        rate=np.ones([max_cluster_num, dims], dtype)),
    reinterpreted_batch_ndims=1,
    name='rv_precision_rmsprop')

rv_alpha_rmsprop = tfd.InverseGamma(
    concentration=np.ones([1], dtype=dtype),
    rate=np.ones([1]),
    name='rv_alpha_rmsprop')

# Define mixture model
rv_observations_rmsprop = tfd.MixtureSameFamily(
    mixture_distribution=tfd.Categorical(probs=mix_probs_rmsprop),
    components_distribution=tfd.MultivariateNormalDiag(
        loc=loc_rmsprop,
        scale_diag=precision_rmsprop))

In [ ]:

og_prob_parts_rmsprop = [
    rv_loc_rmsprop.log_prob(loc_rmsprop),
    rv_precision_rmsprop.log_prob(precision_rmsprop),
    rv_alpha_rmsprop.log_prob(alpha_rmsprop),
    rv_symmetric_dirichlet_process_rmsprop
        .log_prob(mix_probs_rmsprop)[..., tf.newaxis],
    rv_observations_rmsprop.log_prob(observations_tensor)
    * num_samples / batch_size
]
joint_log_prob_rmsprop = tf.reduce_sum(
    tf.concat(log_prob_parts_rmsprop, axis=-1), axis=-1)

In [ ]:

# Define learning rate scheduling
global_step_rmsprop = tf.Variable(0, trainable=False)
learning_rate = tf.train.polynomial_decay(
    starter_learning_rate_rmsprop,
    global_step_rmsprop, decay_steps_rmsprop,
    end_learning_rate_rmsprop, power=1.)

# Set up the optimizer. Don't forget to set data_size=num_samples.
optimizer_kernel_rmsprop = tf.train.RMSPropOptimizer(
    learning_rate=learning_rate,
    decay=0.99)

train_op_rmsprop = optimizer_kernel_rmsprop.minimize(-joint_log_prob_rmsprop)

init_rmsprop = tf.global_variables_initializer()
sess.run(init_rmsprop)

start = time.time()
for it in range(training_steps_rmsprop):
  [
      _
  ] = sess.run([
      train_op_rmsprop
  ], feed_dict={
      observations_tensor: sess.run(next_batch)})

elapsed_time_rmsprop = time.time() - start
print("RMSProp elapsed_time: {} seconds ({} iterations)"
      .format(elapsed_time_rmsprop, training_steps_rmsprop))
print("pSGLD elapsed_time: {} seconds ({} iterations)"
      .format(elapsed_time_psgld, training_steps))

mix_probs_rmsprop_, alpha_rmsprop_, loc_rmsprop_, precision_rmsprop_ =\
  sess.run(training_vals_rmsprop)

RMSProp elapsed_time: 53.7574200630188 seconds (50000 iterations)
pSGLD elapsed_time: 309.8013095855713 seconds (10000 iterations)

pSGLD에 비해 RMSProp에 대한 반복 횟수는 더 길지만, RMSProp에 의한 최적화는 훨씬 빠릅니다.

다음으로 클러스터링 결과를 살펴봅니다.

In [ ]:

cluster_asgmt_rmsprop = sess.run(tf.argmax(
    tf.reduce_mean(posterior, axis=1), axis=1), feed_dict={
        loc_for_posterior: loc_rmsprop_[tf.newaxis, :],
        precision_for_posterior: precision_rmsprop_[tf.newaxis, :],
        mix_probs_for_posterior: mix_probs_rmsprop_[tf.newaxis, :]})

idxs, count = np.unique(cluster_asgmt_rmsprop, return_counts=True)

print('Number of inferred clusters = {}\n'.format(len(count)))
np.set_printoptions(formatter={'float': '{: 0.3f}'.format})

print('Number of elements in each cluster = {}\n'.format(count))

cmap = plt.get_cmap('tab10')
plt.scatter(
    observations[:, 0], observations[:, 1],
    1,
    c=cmap(convert_int_elements_to_consecutive_numbers_in(
        cluster_asgmt_rmsprop)))
plt.axis([-10, 10, -10, 10])
plt.show()

Number of inferred clusters = 4

Number of elements in each cluster = [ 1644 15267 16647 16442]

실험에서 RMSProp 최적화로 올바른 클러스터 수가 추론되지 않았습니다. 또한 혼합 가중치를 봅니다.

In [ ]:

plt.ylabel('MAP inferece of mixture weight')
plt.xlabel('Component')
plt.bar(range(0, max_cluster_num), mix_probs_rmsprop_)
plt.show()

잘못된 수의 구성 요소에 상당한 혼합 가중치가 있음을 알 수 있습니다.

최적화에 더 오랜 시간이 걸리지만 몬테카를로 샘플링 방식을 사용하는 pSGLD가 실험에서 더 나은 성능을 보였습니다.

5. 결론

이 노트북에서는 pSGLD로 가우시안 분포의 디리클레 프로세스 혼합을 맞춤 조정하여 많은 수의 샘플을 클러스터링하고 동시에 클러스터 수를 추론하는 방법을 설명했습니다.

이 실험을 통해 모델이 샘플을 성공적으로 클러스터링하고 올바른 클러스터 수를 추론한 것으로 나타났습니다. 또한 pSGLD의 몬테카를로 샘플링 방식을 통해 결과의 불확실성을 시각화할 수 있음을 보여주었습니다. 샘플을 클러스터링할 뿐만 아니라 모델이 혼합 구성 요소의 올바른 매개변수를 추론할 수 있음을 확인했습니다. 매개변수와 추론된 클러스터 수 간의 관계에 대해, 𝛼 수렴과 추론된 클러스터 수 간의 상관관계를 시각화하여 모델이 효과적인 클러스터 수를 제어하는 매개변수를 학습하는 방법을 조사했습니다. 마지막으로 RMSProp을 사용하여 모델을 맞춤 조정한 결과를 살펴보았습니다. 몬테카를로 샘플링 방식이 없는 옵티마이저인 RMSProp는 pSGLD보다 훨씬 빠르게 동작하지만 클러스터링의 정확성은 떨어집니다.

장난감 데이터세트에는 2차원만 갖는 50,000개의 샘플만 있었지만, 여기에 사용된 미니 배치 방식의 최적화는 훨씬 더 큰 데이터세트에 맞게 확장 가능합니다.

Copyright 2018 The TensorFlow Probability Authors.

사전 조정된 확률적 그래디언트 랑주뱅 동역학으로 디리클레 프로세스 혼합 모델 맞춤 조정하기

목차

1. 샘플

2. 모델

3. 최적화

4. 결과 시각화하기

4.1. 클러스터링된 결과

4.2. 불확실성 시각화하기

4.3. 선택된 혼합 구성 요소의 평균 및 규모

4.4 각 혼합 구성 요소의 혼합 가중치

4.5. $\alpha$ 의 수렴

4.6. 반복을 통해 추론된 클러스터 수

4.7. RMSProp로 모델 맞춤 조정하기

5. 결론

Product

Resources

Company

Copyright 2018 The TensorFlow Probability Authors.

사전 조정된 확률적 그래디언트 랑주뱅 동역학으로 디리클레 프로세스 혼합 모델 맞춤 조정하기

목차

1. 샘플

2. 모델

3. 최적화

4. 결과 시각화하기

4.1. 클러스터링된 결과

4.2. 불확실성 시각화하기

4.3. 선택된 혼합 구성 요소의 평균 및 규모

4.4 각 혼합 구성 요소의 혼합 가중치

4.5. α\alphaα의 수렴

4.6. 반복을 통해 추론된 클러스터 수

4.7. RMSProp로 모델 맞춤 조정하기

5. 결론

4.5. $\alpha$ 의 수렴