Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
tensorflow
GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/zh-cn/quantum/tutorials/barren_plateaus.ipynb
25118 views
Kernel: Python 3
#@title Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # https://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License.

贫瘠高原

本教程将介绍 McClean 在 2019 年的一项研究的结果,这项研究解释了并非任何量子神经网络结构都能提供优秀的学习效果。特别是,您将看到某些大型随机量子电路系列不能有效地用作量子神经网络,因为它们几乎在各处都会出现梯度消失问题。在本例中,您不会针对特定的学习问题训练任何模型,重点是理解梯度行为的更简单问题。

设置

!pip install tensorflow==2.7.0

安装 TensorFlow Quantum:

!pip install tensorflow-quantum==0.7.2
# Update package resources to account for version changes. import importlib, pkg_resources importlib.reload(pkg_resources)

现在,导入 TensorFlow 和模块依赖项:

import tensorflow as tf import tensorflow_quantum as tfq import cirq import sympy import numpy as np # visualization tools %matplotlib inline import matplotlib.pyplot as plt from cirq.contrib.svg import SVGCircuit np.random.seed(1234)

1. 摘要

具有许多如下量子块的随机量子电路(RP(θ)R_{P}(\theta) 为随机 Pauli 旋转):

其中,如果将 f(x)f(x) 定义为相对于任何量子位 aabbZaZbZ_{a}Z_{b} 的期望值,那么将存在一个问题,即 f(x)f'(x) 的平均值将非常接近 0 且变化不大。具体请参见下文:

2. 生成随机电路

论文内提供的构造直观易懂。以下代码所实现的简单函数可以在一组量子位上生成具有给定深度的随机量子电路,有时称为量子神经网络 (QNN):

def generate_random_qnn(qubits, symbol, depth): """Generate random QNN's with the same structure from McClean et al.""" circuit = cirq.Circuit() for qubit in qubits: circuit += cirq.ry(np.pi / 4.0)(qubit) for d in range(depth): # Add a series of single qubit rotations. for i, qubit in enumerate(qubits): random_n = np.random.uniform() random_rot = np.random.uniform( ) * 2.0 * np.pi if i != 0 or d != 0 else symbol if random_n > 2. / 3.: # Add a Z. circuit += cirq.rz(random_rot)(qubit) elif random_n > 1. / 3.: # Add a Y. circuit += cirq.ry(random_rot)(qubit) else: # Add a X. circuit += cirq.rx(random_rot)(qubit) # Add CZ ladder. for src, dest in zip(qubits, qubits[1:]): circuit += cirq.CZ(src, dest) return circuit generate_random_qnn(cirq.GridQubit.rect(1, 3), sympy.Symbol('theta'), 2)

作者研究了单个参数 θ1,1\theta_{1,1} 的梯度。我们继续在 θ1,1\theta_{1,1} 所在电路中放置 sympy.Symbol。由于作者并没有分析电路中任何其他符号的统计信息,让我们立即将它们替换为随机值。

3. 运行电路

生成其中一些电路以及可观测对象以测试梯度变化不大的说法。首先,生成一批随机电路。选择一个随机 ZZ 可观测对象,并使用 TensorFlow Quantum 批量计算梯度和方差。

3.1 批量方差计算

让我们编写一个辅助函数,计算给定可观测对象的梯度在一批电路上的方差:

def process_batch(circuits, symbol, op): """Compute the variance of a batch of expectations w.r.t. op on each circuit that contains `symbol`. Note that this method sets up a new compute graph every time it is called so it isn't as performant as possible.""" # Setup a simple layer to batch compute the expectation gradients. expectation = tfq.layers.Expectation() # Prep the inputs as tensors circuit_tensor = tfq.convert_to_tensor(circuits) values_tensor = tf.convert_to_tensor( np.random.uniform(0, 2 * np.pi, (n_circuits, 1)).astype(np.float32)) # Use TensorFlow GradientTape to track gradients. with tf.GradientTape() as g: g.watch(values_tensor) forward = expectation(circuit_tensor, operators=op, symbol_names=[symbol], symbol_values=values_tensor) # Return variance of gradients across all circuits. grads = g.gradient(forward, values_tensor) grad_var = tf.math.reduce_std(grads, axis=0) return grad_var.numpy()[0]

3.1 设置和运行

选择要生成的随机电路的数量及其深度,以及电路操作的量子位数。然后绘制结果。

n_qubits = [2 * i for i in range(2, 7) ] # Ranges studied in paper are between 2 and 24. depth = 50 # Ranges studied in paper are between 50 and 500. n_circuits = 200 theta_var = [] for n in n_qubits: # Generate the random circuits and observable for the given n. qubits = cirq.GridQubit.rect(1, n) symbol = sympy.Symbol('theta') circuits = [ generate_random_qnn(qubits, symbol, depth) for _ in range(n_circuits) ] op = cirq.Z(qubits[0]) * cirq.Z(qubits[1]) theta_var.append(process_batch(circuits, symbol, op)) plt.semilogy(n_qubits, theta_var) plt.title('Gradient Variance in QNNs') plt.xlabel('n_qubits') plt.xticks(n_qubits) plt.ylabel('$\\partial \\theta$ variance') plt.show()

此图表明,对于量子机器学习问题,不能仅凭猜测随机 QNN 拟设来希望达到最佳效果。为了使梯度变化幅度足以支持机器学习,模型电路中必须具备特定结构。

4. 启发式方法

Grant 于 2019 年提出了一种有趣的启发式方法,支持以近乎随机但又不完全随机的方式开始学习。作者使用了与 McClean 等人相同的电路,针对经典控制参数提出了一种不同的初始化技术,可化解“贫瘠高原”问题。这种初始化技术首先对一些层使用完全随机的控制参数,但在紧随其后的层中,将选择前几层的初始变换未完成的参数。作者称其为标识块

这种启发式方法的优势在于,只需更改单个参数即可,当前块之外的所有其他块都将保持原有标识,而梯度信号将比之前更强。这种方法使用户可以通过选择需要修改的变量和块来获得强梯度信号。这种启发式方法并不能防止用户在训练阶段陷入“贫瘠高原”困境(并限制完全同步更新),但可以保证您在刚开始工作时位于“高原”之外。

4.1 构造新的 QNN

现在,构造一个函数来生成标识块 QNN。这与论文中的实现略有不同。目前,使单个参数的梯度行为与 McClean 等人的研究一致即可,因此可以进行一些简化。

要生成标识块并训练模型,通常需要 U1(θ1a)U1(θ1b)U1(\theta_{1a}) U1(\theta_{1b})^{\dagger} 而非 U1(θ1)U1(θ1)U1(\theta_1) U1(\theta_1)^{\dagger}。最初,θ1a\theta_{1a}θ1b\theta_{1b} 为相同的角度,但二者需独立学习。否则,即使经过训练后,您也将始终获得同一标识。标识块的数量需要根据经验进行选择。块越深,块中部的方差就越小。但在块的开头和结尾,参数梯度的方差应较大。

def generate_identity_qnn(qubits, symbol, block_depth, total_depth): """Generate random QNN's with the same structure from Grant et al.""" circuit = cirq.Circuit() # Generate initial block with symbol. prep_and_U = generate_random_qnn(qubits, symbol, block_depth) circuit += prep_and_U # Generate dagger of initial block without symbol. U_dagger = (prep_and_U[1:])**-1 circuit += cirq.resolve_parameters( U_dagger, param_resolver={symbol: np.random.uniform() * 2 * np.pi}) for d in range(total_depth - 1): # Get a random QNN. prep_and_U_circuit = generate_random_qnn( qubits, np.random.uniform() * 2 * np.pi, block_depth) # Remove the state-prep component U_circuit = prep_and_U_circuit[1:] # Add U circuit += U_circuit # Add U^dagger circuit += U_circuit**-1 return circuit generate_identity_qnn(cirq.GridQubit.rect(1, 3), sympy.Symbol('theta'), 2, 2)

4.2 比较

在这里,您可以看到启发式方法确实有助于防止梯度方差迅速消失:

block_depth = 10 total_depth = 5 heuristic_theta_var = [] for n in n_qubits: # Generate the identity block circuits and observable for the given n. qubits = cirq.GridQubit.rect(1, n) symbol = sympy.Symbol('theta') circuits = [ generate_identity_qnn(qubits, symbol, block_depth, total_depth) for _ in range(n_circuits) ] op = cirq.Z(qubits[0]) * cirq.Z(qubits[1]) heuristic_theta_var.append(process_batch(circuits, symbol, op)) plt.semilogy(n_qubits, theta_var) plt.semilogy(n_qubits, heuristic_theta_var) plt.title('Heuristic vs. Random') plt.xlabel('n_qubits') plt.xticks(n_qubits) plt.ylabel('$\\partial \\theta$ variance') plt.show()

这是从(近乎)随机 QNN 获得更强梯度信号的一项重大改进。