GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ko/quantum/tutorials/mnist.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2020 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

MNIST 분류

이 튜토리얼에서는 Farhi 등이 이용한 접근 방식과 유사하게, MNIST의 단순화된 버전을 분류하기 위한 QNN(양자 신경망)을 빌드합니다. 이 고전적 데이터 문제에서 양자 신경망이 보이는 성능을 고전적 신경망과 비교합니다.

설정

In [ ]:

!pip install tensorflow==2.7.0

TensorFlow Quantum을 설치합니다.

In [ ]:

!pip install tensorflow-quantum==0.7.2

In [ ]:

# Update package resources to account for version changes.
import importlib, pkg_resources
importlib.reload(pkg_resources)

이제 TensorFlow 및 모듈 종속성을 가져옵니다.

In [ ]:

import tensorflow as tf
import tensorflow_quantum as tfq

import cirq
import sympy
import numpy as np
import seaborn as sns
import collections

# visualization tools
%matplotlib inline
import matplotlib.pyplot as plt
from cirq.contrib.svg import SVGCircuit

1. 데이터 로드하기

이 튜토리얼에서는 Farhi 등의 연구에 따라 숫자 3과 6을 구분하는 이진 분류자를 빌드합니다. 이 섹션에서는 다음과 같은 데이터 처리 방법을 다룹니다.

Keras로부터 원시 데이터를 로드합니다.
데이터세트를 3과 6으로만 필터링합니다.
양자 컴퓨터에 맞도록 이미지를 축소합니다.
모순된 예를 제거합니다.
이진 이미지를 Cirq 회로로 변환합니다.
Circ 회로를 TensorFlow Quantum 회로로 변환합니다.

1.1 원시 데이터 로드하기

Keras와 함께 배포된 MNIST 데이터세트를 로드합니다.

In [ ]:

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Rescale the images from [0,255] to the [0.0,1.0] range.
x_train, x_test = x_train[..., np.newaxis]/255.0, x_test[..., np.newaxis]/255.0

print("Number of original training examples:", len(x_train))
print("Number of original test examples:", len(x_test))

데이터세트를 필터링하여 3과 6만 유지하고 다른 클래스는 제거합니다. 동시에 레이블 y를 3은 True, 6은 False에 대응하는 부울로 변환합니다.

In [ ]:

def filter_36(x, y):
    keep = (y == 3) | (y == 6)
    x, y = x[keep], y[keep]
    y = y == 3
    return x,y

In [ ]:

x_train, y_train = filter_36(x_train, y_train)
x_test, y_test = filter_36(x_test, y_test)

print("Number of filtered training examples:", len(x_train))
print("Number of filtered test examples:", len(x_test))

첫 번째 예를 나타냅니다.

In [ ]:

print(y_train[0])

plt.imshow(x_train[0, :, :, 0])
plt.colorbar()

1.2 이미지 축소하기

28x28의 이미지 크기는 현재 양자 컴퓨터에서 처리하기에 너무 큽니다. 이미지 크기를 4x4로 축소합니다.

In [ ]:

x_train_small = tf.image.resize(x_train, (4,4)).numpy()
x_test_small = tf.image.resize(x_test, (4,4)).numpy()

다시 한 번, 크기 조정 후 첫 번째 훈련 예를 표시합니다.

In [ ]:

print(y_train[0])

plt.imshow(x_train_small[0,:,:,0], vmin=0, vmax=1)
plt.colorbar()

1.3 모순된 예 제거하기

Farhi 등의 3.3 Learning to Distinguish Digits 섹션의 내용과 같이 데이터세트를 필터링하여 두 클래스에 모두 속하는 것으로 레이블이 지정된 이미지를 제거합니다.

다음은 표준 머신러닝 프로시저는 아니지만 논문을 따르는 차원에서 포함되었습니다.

In [ ]:

def remove_contradicting(xs, ys):
    mapping = collections.defaultdict(set)
    orig_x = {}
    # Determine the set of labels for each unique image:
    for x,y in zip(xs,ys):
       orig_x[tuple(x.flatten())] = x
       mapping[tuple(x.flatten())].add(y)
    
    new_x = []
    new_y = []
    for flatten_x in mapping:
      x = orig_x[flatten_x]
      labels = mapping[flatten_x]
      if len(labels) == 1:
          new_x.append(x)
          new_y.append(next(iter(labels)))
      else:
          # Throw out images that match more than one label.
          pass
    
    num_uniq_3 = sum(1 for value in mapping.values() if len(value) == 1 and True in value)
    num_uniq_6 = sum(1 for value in mapping.values() if len(value) == 1 and False in value)
    num_uniq_both = sum(1 for value in mapping.values() if len(value) == 2)

    print("Number of unique images:", len(mapping.values()))
    print("Number of unique 3s: ", num_uniq_3)
    print("Number of unique 6s: ", num_uniq_6)
    print("Number of unique contradicting labels (both 3 and 6): ", num_uniq_both)
    print()
    print("Initial number of images: ", len(xs))
    print("Remaining non-contradicting unique images: ", len(new_x))
    
    return np.array(new_x), np.array(new_y)

결과 개수는 보고된 값에 근사하게 일치하지 않지만 정확한 프로시저는 지정되어 있지 않습니다.

이 시점에서 모순된 예를 필터링한다고 해서 모델에서 모순된 훈련 예가 완전히 배제되는 것은 아니라는 점을 알아야 합니다. 다음 단계에서는 데이터를 이진화하여 더 많은 충돌이 일어나게 합니다.

In [ ]:

x_train_nocon, y_train_nocon = remove_contradicting(x_train_small, y_train)

1.4 데이터를 양자 회로로 인코딩하기

양자 컴퓨터를 사용하여 이미지를 처리하기 위해 Farhi 등은 각 픽셀을 큐비트로 표현하고 픽셀 값에 따라 상태를 결정하는 방법을 제안했습니다. 첫 단계는 이진 인코딩으로 변환하는 것입니다.

In [ ]:

THRESHOLD = 0.5

x_train_bin = np.array(x_train_nocon > THRESHOLD, dtype=np.float32)
x_test_bin = np.array(x_test_small > THRESHOLD, dtype=np.float32)

이 시점에서 모순된 이미지를 제거하면 193개만 남게 되는데, 효과적인 훈련에는 충분하지 않습니다.

In [ ]:

_ = remove_contradicting(x_train_bin, y_train_nocon)

임계값을 초과하는 값을 가진 픽셀 인덱스의 큐비트는 $X$ 게이트를 통해 순환합니다.

In [ ]:

def convert_to_circuit(image):
    """Encode truncated classical image into quantum datapoint."""
    values = np.ndarray.flatten(image)
    qubits = cirq.GridQubit.rect(4, 4)
    circuit = cirq.Circuit()
    for i, value in enumerate(values):
        if value:
            circuit.append(cirq.X(qubits[i]))
    return circuit


x_train_circ = [convert_to_circuit(x) for x in x_train_bin]
x_test_circ = [convert_to_circuit(x) for x in x_test_bin]

다음은 첫 번째 예를 위해 만들어진 회로입니다(회로 다이어그램에는 게이트가 0인 큐비트를 표시하지 않음).

In [ ]:

SVGCircuit(x_train_circ[0])

이 회로를 이미지 값이 임계값을 초과하는 인덱스와 비교합니다.

In [ ]:

bin_img = x_train_bin[0,:,:,0]
indices = np.array(np.where(bin_img)).T
indices

다음 Cirq 회로를 tfq 텐서로 변환합니다.

In [ ]:

x_train_tfcirc = tfq.convert_to_tensor(x_train_circ)
x_test_tfcirc = tfq.convert_to_tensor(x_test_circ)

2. 양자 신경망

이미지를 분류하는 양자 회로 구조를 안내하는 자료는 거의 없습니다. 판독 큐비트의 기대치를 바탕으로 분류가 이루어지기 때문에 Farhi 등은 판독 큐비트가 항상 작용하는 두 개의 큐비트 게이트 사용을 제안했습니다. 이것은 픽셀에서 작은 단일 RNN을 실행하는 것과 비슷한 면이 있습니다.

2.1 모델 회로 빌드하기

다음 예는 이러한 레이어 구조의 접근 방식을 보여줍니다. 각 레이어는 동일한 게이트의 n개 인스턴스를 사용하며 각 데이터 큐비트는 판독 큐비트에 작용합니다.

이들 게이트의 레이어를 회로에 추가하는 간단한 클래스로 시작합니다.

In [ ]:

class CircuitLayerBuilder():
    def __init__(self, data_qubits, readout):
        self.data_qubits = data_qubits
        self.readout = readout
    
    def add_layer(self, circuit, gate, prefix):
        for i, qubit in enumerate(self.data_qubits):
            symbol = sympy.Symbol(prefix + '-' + str(i))
            circuit.append(gate(qubit, self.readout)**symbol)

어떻게 나타나는지 확인하기 위해 예제 회로 레이어를 빌드합니다.

In [ ]:

demo_builder = CircuitLayerBuilder(data_qubits = cirq.GridQubit.rect(4,1),
                                   readout=cirq.GridQubit(-1,-1))

circuit = cirq.Circuit()
demo_builder.add_layer(circuit, gate = cirq.XX, prefix='xx')
SVGCircuit(circuit)

이제 데이터 회로 크기와 일치하는 2개 레이어 모델을 빌드하고 준비 및 판독 연산을 포함합니다.

In [ ]:

def create_quantum_model():
    """Create a QNN model circuit and readout operation to go along with it."""
    data_qubits = cirq.GridQubit.rect(4, 4)  # a 4x4 grid.
    readout = cirq.GridQubit(-1, -1)         # a single qubit at [-1,-1]
    circuit = cirq.Circuit()
    
    # Prepare the readout qubit.
    circuit.append(cirq.X(readout))
    circuit.append(cirq.H(readout))
    
    builder = CircuitLayerBuilder(
        data_qubits = data_qubits,
        readout=readout)

    # Then add layers (experiment by adding more).
    builder.add_layer(circuit, cirq.XX, "xx1")
    builder.add_layer(circuit, cirq.ZZ, "zz1")

    # Finally, prepare the readout qubit.
    circuit.append(cirq.H(readout))

    return circuit, cirq.Z(readout)

In [ ]:

model_circuit, model_readout = create_quantum_model()

2.2 tfq-keras 모델에서 모델 회로 래핑하기

양자 구성 요소로 Keras 모델을 빌드합니다. 이 모델은 고전적 데이터를 인코딩하는 x_train_circ로부터 "양자 데이터"를 제공받습니다. 매개변수화된 양자 회로 레이어인 tfq.layers.PQC가 양자 데이터에서 모델 회로를 훈련하는 데 이용됩니다.

이들 이미지를 분류하기 위해 Farhi 등은 매개변수화된 회로에서 판독 큐비트의 기대치를 가져오는 방법을 제안했습니다. 1과 -1 사이의 기대치가 반환됩니다.

In [ ]:

# Build the Keras model.
model = tf.keras.Sequential([
    # The input is the data-circuit, encoded as a tf.string
    tf.keras.layers.Input(shape=(), dtype=tf.string),
    # The PQC layer returns the expected value of the readout gate, range [-1,1].
    tfq.layers.PQC(model_circuit, model_readout),
])

다음으로, compile 메서드를 사용하여 모델에 대한 훈련 프로시저를 설명합니다.

예상 판독값이 [-1,1] 범위에 있기 때문에 힌지 손실(hinge loss)을 최적화하는 것이 다소 자연스럽습니다.

참고: 또 다른 유효한 접근 방식은 출력 범위를 [0,1]로 이동하고 이를 모델이 클래스 3에 할당하는 확률로 처리하는 것입니다. 이 방법을 표준 tf.losses.BinaryCrossentropy 손실과 함께 사용할 수 있습니다.

여기에서 힌지 손실을 사용하려면 두 가지 작은 조정이 필요합니다. 먼저, 힌지 손실로 예상되는 바와 같이 레이블 y_train_nocon을 부울에서 [-1,1]로 변환합니다.

In [ ]:

y_train_hinge = 2.0*y_train_nocon-1.0
y_test_hinge = 2.0*y_test-1.0

둘째, [-1, 1]을 y_true 레이블 인수로 올바르게 처리하는 사용자 정의 hinge_accuracy 메트릭을 사용합니다. tf.losses.BinaryAccuracy(threshold=0.0)는 y_true가 부울일 것으로 예상하므로 힌지 손실과 함께 사용할 수 없습니다.

In [ ]:

def hinge_accuracy(y_true, y_pred):
    y_true = tf.squeeze(y_true) > 0.0
    y_pred = tf.squeeze(y_pred) > 0.0
    result = tf.cast(y_true == y_pred, tf.float32)

    return tf.reduce_mean(result)

In [ ]:

model.compile(
    loss=tf.keras.losses.Hinge(),
    optimizer=tf.keras.optimizers.Adam(),
    metrics=[hinge_accuracy])

In [ ]:

print(model.summary())

양자 모델 훈련하기

이제 모델을 훈련합니다. 이 과정에는 약 45분이 소요됩니다. 오래 기다리고 싶지 않다면 데이터의 일부만 사용하세요(아래에서 NUM_EXAMPLES=500 설정). 그래도 훈련 중에 모델의 진행에는 실질적인 영향을 미치지 않습니다(32개의 매개변수만 있고 이를 제한하기 위해 많은 데이터가 필요하지 않음). 적은 수의 예를 사용하면 훈련이 빠르게 끝나지만(5분) 검증 로그에 기록될 만큼은 길게 실행됩니다.

In [ ]:

EPOCHS = 3
BATCH_SIZE = 32

NUM_EXAMPLES = len(x_train_tfcirc)

In [ ]:

x_train_tfcirc_sub = x_train_tfcirc[:NUM_EXAMPLES]
y_train_hinge_sub = y_train_hinge[:NUM_EXAMPLES]

이 모델이 수렴하도록 훈련하면 테스트 세트에서 85% 이상의 정확성이 얻어집니다.

In [ ]:

qnn_history = model.fit(
      x_train_tfcirc_sub, y_train_hinge_sub,
      batch_size=32,
      epochs=EPOCHS,
      verbose=1,
      validation_data=(x_test_tfcirc, y_test_hinge))

qnn_results = model.evaluate(x_test_tfcirc, y_test)

참고: 훈련 정확성은 epoch 동안의 평균을 보고합니다. 검증 정확성은 epoch가 끝날 때마다 평가됩니다.

3. 고전적 신경망

양자 신경망이 이 단순화된 MNIST 문제에 효과가 있지만 기존의 고전적 신경망은 이 작업에서 QNN의 성능을 쉽게 능가할 수 있습니다. 단일 epoch 후에 고전적 신경망은 홀드아웃 세트에서 98% 이상의 정확성을 실현할 수 있습니다.

다음 예에서는 이미지를 하위 샘플링하는 대신 전체 28x28 이미지를 사용하는 3-6 분류 문제에 고전적인 신경망을 사용합니다. 테스트 세트에서는 거의 100% 정확성에 쉽게 도달합니다.

In [ ]:

def create_classical_model():
    # A simple model based off LeNet from https://keras.io/examples/mnist_cnn/
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Conv2D(32, [3, 3], activation='relu', input_shape=(28,28,1)))
    model.add(tf.keras.layers.Conv2D(64, [3, 3], activation='relu'))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.Dropout(0.25))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(128, activation='relu'))
    model.add(tf.keras.layers.Dropout(0.5))
    model.add(tf.keras.layers.Dense(1))
    return model


model = create_classical_model()
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])

model.summary()

In [ ]:

model.fit(x_train,
          y_train,
          batch_size=128,
          epochs=1,
          verbose=1,
          validation_data=(x_test, y_test))

cnn_results = model.evaluate(x_test, y_test)

위의 모델에는 거의 120만 개의 매개변수가 있습니다. 보다 공정한 비교를 위해 하위 샘플링된 이미지에서 37-매개변수 모델을 시도해 보세요.

In [ ]:

def create_fair_classical_model():
    # A simple model based off LeNet from https://keras.io/examples/mnist_cnn/
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Flatten(input_shape=(4,4,1)))
    model.add(tf.keras.layers.Dense(2, activation='relu'))
    model.add(tf.keras.layers.Dense(1))
    return model


model = create_fair_classical_model()
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])

model.summary()

In [ ]:

model.fit(x_train_bin,
          y_train_nocon,
          batch_size=128,
          epochs=20,
          verbose=2,
          validation_data=(x_test_bin, y_test))

fair_nn_results = model.evaluate(x_test_bin, y_test)

4. 비교

입력 해상도가 더 높고 모델이 더 강력하다면 CNN에서 이 문제는 더 쉬워집니다. 한편, 유사한 처리 능력(~ 32개의 매개변수)을 가진 고전적 모델은 훨씬 짧은 시간 내에 유사한 정확성으로 훈련됩니다. 여러 가지 이유로 고전적 신경망은 양자 신경망의 성능을 쉽게 능가합니다. 기존 데이터를 이용하는 경우라면 고전적 신경망을 넘어서기는 어렵습니다.

In [ ]:

qnn_accuracy = qnn_results[1]
cnn_accuracy = cnn_results[1]
fair_nn_accuracy = fair_nn_results[1]

sns.barplot(x=["Quantum", "Classical, full", "Classical, fair"],
            y=[qnn_accuracy, cnn_accuracy, fair_nn_accuracy])