GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ko/quantum/tutorials/barren_plateaus.ipynb
³⁸⁷¹⁹ views

Kernel: Python 3

Copyright 2020 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

불모의 고원(Barren plateaus)

이 예제에서는 학습에 관해서는 모든 양자 신경망 구조가 잘 동작하는 것은 아니라는 McClean, 2019의 결과를 살펴보겠습니다. 특히, 특정 대규모 임의 양자 회로 제품군은 거의 모든 곳에서 사라지는 그래디언트 때문에 좋은 양자 신경망의 역할을 하지 못한다는 것을 알게 될 것입니다. 이 예제에서는 특정 학습 문제에 대한 모델을 훈련하지 않고 대신 그래디언트 동작을 이해하는 더 간단한 문제에 초점을 맞춥니다.

설정

In [ ]:

!pip install tensorflow==2.7.0

TensorFlow Quantum을 설치합니다.

In [ ]:

!pip install tensorflow-quantum==0.7.2

In [ ]:

# Update package resources to account for version changes.
import importlib, pkg_resources
importlib.reload(pkg_resources)

이제 TensorFlow 및 모듈 종속성을 가져옵니다.

In [ ]:

import tensorflow as tf
import tensorflow_quantum as tfq

import cirq
import sympy
import numpy as np

# visualization tools
%matplotlib inline
import matplotlib.pyplot as plt
from cirq.contrib.svg import SVGCircuit

np.random.seed(1234)

1. 요약

다음은 블록이 많은 임의의 양자 회로입니다( $R_{P}(\theta)$ 는 임의 Pauli 회전).

$f(x)$ 가 큐비트 $a$ 및 $b$ 에 대해 예상값 w.r.t. $Z_{a}Z_{b}$ 으로 정의되면, $f'(x)$ 의 평균이 0에 매우 가깝고 크게 변하지 않는 문제가 있습니다. 아래를 참조하세요.

2. 임의 회로 생성하기

이 논문에 쓰인 구성은 따라 하기 쉽습니다. 다음은 큐비트 세트에서 주어진 깊이로 임의의 양자 회로(QNN)(양자 신경망이라고도 함)를 생성하는 간단한 함수를 구현합니다.

In [ ]:

def generate_random_qnn(qubits, symbol, depth):
    """Generate random QNN's with the same structure from McClean et al."""
    circuit = cirq.Circuit()
    for qubit in qubits:
        circuit += cirq.ry(np.pi / 4.0)(qubit)

    for d in range(depth):
        # Add a series of single qubit rotations.
        for i, qubit in enumerate(qubits):
            random_n = np.random.uniform()
            random_rot = np.random.uniform(
            ) * 2.0 * np.pi if i != 0 or d != 0 else symbol
            if random_n > 2. / 3.:
                # Add a Z.
                circuit += cirq.rz(random_rot)(qubit)
            elif random_n > 1. / 3.:
                # Add a Y.
                circuit += cirq.ry(random_rot)(qubit)
            else:
                # Add a X.
                circuit += cirq.rx(random_rot)(qubit)

        # Add CZ ladder.
        for src, dest in zip(qubits, qubits[1:]):
            circuit += cirq.CZ(src, dest)

    return circuit


generate_random_qnn(cirq.GridQubit.rect(1, 3), sympy.Symbol('theta'), 2)

저자는 단일 매개변수 $\theta_{1,1}$ 의 그래디언트를 조사합니다. $\theta_{1,1}$ 가 있는 회로에 sympy.Symbol을 배치하여 따라가 보겠습니다. 저자는 회로의 다른 기호에 대한 통계를 분석하지 않으므로 나중에 대신 임의 값으로 대체하겠습니다.

3. 회로 실행하기

이들 회로 중 몇 개를 가측치와 함께 생성하여 그래디언트가 크게 변하지 않는다는 주장을 테스트합니다. 먼저 임의 회로 배치를 생성합니다. 무작위 ZZ 가측치를 선택하고, TensorFlow Quantum을 사용하여 그래디언트와 분산을 일괄 계산합니다.

3.1 배치 분산 계산

회로 배치에서 주어진 관찰 가능 항목의 그래디언트 분산을 계산하는 도우미 함수를 작성해 보겠습니다.

In [ ]:

def process_batch(circuits, symbol, op):
    """Compute the variance of a batch of expectations w.r.t. op on each circuit that 
    contains `symbol`. Note that this method sets up a new compute graph every time it is
    called so it isn't as performant as possible."""

    # Setup a simple layer to batch compute the expectation gradients.
    expectation = tfq.layers.Expectation()

    # Prep the inputs as tensors
    circuit_tensor = tfq.convert_to_tensor(circuits)
    values_tensor = tf.convert_to_tensor(
        np.random.uniform(0, 2 * np.pi, (n_circuits, 1)).astype(np.float32))

    # Use TensorFlow GradientTape to track gradients.
    with tf.GradientTape() as g:
        g.watch(values_tensor)
        forward = expectation(circuit_tensor,
                              operators=op,
                              symbol_names=[symbol],
                              symbol_values=values_tensor)

    # Return variance of gradients across all circuits.
    grads = g.gradient(forward, values_tensor)
    grad_var = tf.math.reduce_std(grads, axis=0)
    return grad_var.numpy()[0]

3.1 설정 및 실행

생성할 임의 회로의 수와 함께 회로의 깊이 및 회로가 작동해야 하는 큐비트의 양을 선택합니다. 그런 다음 결과를 플롯합니다.

In [ ]:

n_qubits = [2 * i for i in range(2, 7)
           ]  # Ranges studied in paper are between 2 and 24.
depth = 50  # Ranges studied in paper are between 50 and 500.
n_circuits = 200
theta_var = []

for n in n_qubits:
    # Generate the random circuits and observable for the given n.
    qubits = cirq.GridQubit.rect(1, n)
    symbol = sympy.Symbol('theta')
    circuits = [
        generate_random_qnn(qubits, symbol, depth) for _ in range(n_circuits)
    ]
    op = cirq.Z(qubits[0]) * cirq.Z(qubits[1])
    theta_var.append(process_batch(circuits, symbol, op))

plt.semilogy(n_qubits, theta_var)
plt.title('Gradient Variance in QNNs')
plt.xlabel('n_qubits')
plt.xticks(n_qubits)
plt.ylabel('$\\partial \\theta$ variance')
plt.show()

이 플롯은 양자 머신러닝 문제의 경우, 단순하게 임의의 QNN ansatz를 추측하고 잘 되기만을 바랄 수 없다는 것을 보여줍니다. 그래디언트가 학습이 발생할 수 있는 지점까지 변하려면 모델 회로에 일부 구조가 반드시 있어야 합니다.

4. 휴리스틱

Grant, 2019의 흥미로운 휴리스틱을 사용하면 임의에 가깝게 시작할 수 있지만, 이것이 충분하지 않습니다. McClean의 논문에서와 같은 회로를 사용하여, 저자는 고전적인 제어 매개변수를 사용하여 불모의 고원을 피하는 다른 초기화 기술을 제안합니다. 초기화 기술은 완전히 임의의 제어 매개변수를 사용하여 일부 레이어를 시작하지만, 바로 다음 레이어에서 처음 몇 개의 레이어에서 수행된 초기 변환이 취소되도록 매개변수를 선택합니다. 저자는 이를 *ID 블록(identity block)*이라고 합니다.

이 휴리스틱의 장점은 단일 매개변수만 변경하면 현재 블록 외부의 다른 모든 블록이 동일하게 유지되고 그래디언트 신호가 이전보다 훨씬 더 강하게 전달된다는 것입니다. 이를 통해 사용자는 강력한 그래디언트 신호를 얻기 위해 수정할 변수와 블록을 고르고 선택할 수 있습니다. 이 휴리스틱은 사용자가 훈련 단계 동안 불모의 고원에 빠지는 것을 방지하지 않으며(그리고 완전 동시 업데이트를 제한함), 고원(plateau) 밖에서 시작할 수 있음을 보장합니다.

4.1 새로운 QNN 구성

이제 ID 블록 QNN을 생성하는 함수를 구성합니다. 이 구현은 논문에 나오는 구현과는 약간 다릅니다. 현재로서는 단일 매개변수의 그래디언트 동작을 살펴보면 이는 McClean 논문과 일치하므로 몇 가지 단순화를 수행할 수 있습니다.

ID 블록을 생성하고 모델을 훈련하려면 일반적으로 $U1(\theta_1) U1(\theta_1)^{\dagger}$ 이 아니라 $U1(\theta_{1a}) U1(\theta_{1b})^{\dagger}$ 가 필요합니다. 처음에 $\theta_{1a}$ 와 $\theta_{1b}$ 는 같은 각도지만 독립적으로 훈련됩니다. 그렇지 않으면 훈련 후에도 항상 ID를 얻습니다. ID 블록 수에 대한 선택은 경험적입니다. 블록이 깊을수록 블록 중간의 분산이 작아집니다. 하지만 블록의 시작과 끝에서 매개변수 그래디언트의 분산이 커야 합니다.

In [ ]:

def generate_identity_qnn(qubits, symbol, block_depth, total_depth):
    """Generate random QNN's with the same structure from Grant et al."""
    circuit = cirq.Circuit()

    # Generate initial block with symbol.
    prep_and_U = generate_random_qnn(qubits, symbol, block_depth)
    circuit += prep_and_U

    # Generate dagger of initial block without symbol.
    U_dagger = (prep_and_U[1:])**-1
    circuit += cirq.resolve_parameters(
        U_dagger, param_resolver={symbol: np.random.uniform() * 2 * np.pi})

    for d in range(total_depth - 1):
        # Get a random QNN.
        prep_and_U_circuit = generate_random_qnn(
            qubits,
            np.random.uniform() * 2 * np.pi, block_depth)

        # Remove the state-prep component
        U_circuit = prep_and_U_circuit[1:]

        # Add U
        circuit += U_circuit

        # Add U^dagger
        circuit += U_circuit**-1

    return circuit


generate_identity_qnn(cirq.GridQubit.rect(1, 3), sympy.Symbol('theta'), 2, 2)

4.2 비교

여기서 휴리스틱이 그래디언트의 분산이 빠르게 사라지는 것을 방지하는 데 도움이 된다는 것을 알 수 있습니다.

In [ ]:

block_depth = 10
total_depth = 5

heuristic_theta_var = []

for n in n_qubits:
    # Generate the identity block circuits and observable for the given n.
    qubits = cirq.GridQubit.rect(1, n)
    symbol = sympy.Symbol('theta')
    circuits = [
        generate_identity_qnn(qubits, symbol, block_depth, total_depth)
        for _ in range(n_circuits)
    ]
    op = cirq.Z(qubits[0]) * cirq.Z(qubits[1])
    heuristic_theta_var.append(process_batch(circuits, symbol, op))

plt.semilogy(n_qubits, theta_var)
plt.semilogy(n_qubits, heuristic_theta_var)
plt.title('Heuristic vs. Random')
plt.xlabel('n_qubits')
plt.xticks(n_qubits)
plt.ylabel('$\\partial \\theta$ variance')
plt.show()

이는 임의(에 가까운) QNN에서 더 강한 그래디언트 신호를 얻는 데 있어 크게 개선된 것입니다.