GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/ja/guide/sparse_tensor_guide.ipynb
²⁵¹¹⁵ views

Kernel: Python 3

Copyright 2020 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

スパーステンソルの使用

多くのゼロ値を含むテンソルを使用する場合、それらをスペース効率と時間効率の高い方法で格納することが重要です。スパーステンソルは、多くのゼロ値を含むテンソルの効率的な格納と処理を可能にします。スパーステンソルは、TF-IDF などのエンコードスキームで、NLP アプリケーションでのデータ前処理の一部として広く使用されています。また、コンピュータビジョンアプリケーションで多くの暗いピクセルを含む画像を前処理するために使用されています。

TensorFlow のスパーステンソル

TensorFlow は、tf.sparse.SparseTensor オブジェクトを通じてスパーステンソルを表します。現在、TensorFlow のスパーステンソルは、COO 形式 (座標形式) を使用してエンコードされています。このエンコード形式は、埋め込みなどの超疎行列用に最適化されています。

スパーステンソルの COO エンコーディングは、次で構成されます。

values: すべての非ゼロ値を含む形状 [N] の 1 次元テンソル。
indices: 非ゼロ値のインデックスを含む、[N, rank] の形状を持つ 2 次元テンソル。
dense_shape: テンソルの形状を指定する、形状 [rank] を持つ 1 次元テンソル。

tf.sparse.SparseTensor のコンテキストにおける非ゼロ値は、明示的にエンコードされていない値です。COO 疎行列の values にゼロ値を明示的に含めることは可能ですが、スパーステンソルで非ゼロ値を参照する場合、これらの「明示的なゼロ」は通常含まれません。

注意: tf.sparse.SparseTensor では、インデックス/値が特定の順序である必要はありませんが、いくつかの演算は行優先の順序であると想定しています。tf.sparse.reorder を使用して、標準の行優先順で並べ替えられたスパーステンソルのコピーを作成します。

`tf.sparse.SparseTensor` の構築

values、indices、および dense_shape を直接指定して、スパーステンソルを構築します。

In [ ]:

import tensorflow as tf

In [ ]:

st1 = tf.sparse.SparseTensor(indices=[[0, 3], [2, 4]],
                      values=[10, 20],
                      dense_shape=[3, 10])

print() 関数を使用してスパーステンソルを出力すると、3 つの成分テンソルの内容が表示されます。

In [ ]:

print(st1)

非ゼロの values が対応する indices と整列している場合、スパーステンソルの内容が理解しやすくなります。非ゼロ値がそれぞれの行に表示されるようにスパーステンソルを出力するヘルパー関数を定義します。

In [ ]:

def pprint_sparse_tensor(st):
  s = "<SparseTensor shape=%s \n values={" % (st.dense_shape.numpy().tolist(),)
  for (index, value) in zip(st.indices, st.values):
    s += f"\n  %s: %s" % (index.numpy().tolist(), value.numpy().tolist())
  return s + "}>"

In [ ]:

print(pprint_sparse_tensor(st1))

また、tf.sparse.from_dense を使用して密テンソルからスパーステンソルを構築し、tf.sparse.to_dense を使用してそれらを密テンソルに戻すこともできます。

In [ ]:

st2 = tf.sparse.from_dense([[1, 0, 0, 8], [0, 0, 0, 0], [0, 0, 3, 0]])
print(pprint_sparse_tensor(st2))

In [ ]:

st3 = tf.sparse.to_dense(st2)
print(st3)

スパーステンソルの操作

tf.sparse パッケージのユーティリティを使用して、スパーステンソルを操作します。密なテンソルに使用できる tf.math.add のような算術演算は、スパーステンソルでは機能しません。

tf.sparse.add を使用して、同じ形状のスパーステンソルを追加します。

In [ ]:

st_a = tf.sparse.SparseTensor(indices=[[0, 2], [3, 4]],
                       values=[31, 2], 
                       dense_shape=[4, 10])

st_b = tf.sparse.SparseTensor(indices=[[0, 2], [7, 0]],
                       values=[56, 38],
                       dense_shape=[4, 10])

st_sum = tf.sparse.add(st_a, st_b)

print(pprint_sparse_tensor(st_sum))

tf.sparse.sparse_dense_matmul を使用して、スパーステンソルと密行列を乗算します。

In [ ]:

st_c = tf.sparse.SparseTensor(indices=([0, 1], [1, 0], [1, 1]),
                       values=[13, 15, 17],
                       dense_shape=(2,2))

mb = tf.constant([[4], [6]])
product = tf.sparse.sparse_dense_matmul(st_c, mb)

print(product)

tf.sparse.concat を使用してスパーステンソルをまとめ、tf.sparse.slice を使用してそれらをスライスします。

In [ ]:

sparse_pattern_A = tf.sparse.SparseTensor(indices = [[2,4], [3,3], [3,4], [4,3], [4,4], [5,4]],
                         values = [1,1,1,1,1,1],
                         dense_shape = [8,5])
sparse_pattern_B = tf.sparse.SparseTensor(indices = [[0,2], [1,1], [1,3], [2,0], [2,4], [2,5], [3,5], 
                                              [4,5], [5,0], [5,4], [5,5], [6,1], [6,3], [7,2]],
                         values = [1,1,1,1,1,1,1,1,1,1,1,1,1,1],
                         dense_shape = [8,6])
sparse_pattern_C = tf.sparse.SparseTensor(indices = [[3,0], [4,0]],
                         values = [1,1],
                         dense_shape = [8,6])

sparse_patterns_list = [sparse_pattern_A, sparse_pattern_B, sparse_pattern_C]
sparse_pattern = tf.sparse.concat(axis=1, sp_inputs=sparse_patterns_list)
print(tf.sparse.to_dense(sparse_pattern))

In [ ]:

sparse_slice_A = tf.sparse.slice(sparse_pattern_A, start = [0,0], size = [8,5])
sparse_slice_B = tf.sparse.slice(sparse_pattern_B, start = [0,5], size = [8,6])
sparse_slice_C = tf.sparse.slice(sparse_pattern_C, start = [0,10], size = [8,6])
print(tf.sparse.to_dense(sparse_slice_A))
print(tf.sparse.to_dense(sparse_slice_B))
print(tf.sparse.to_dense(sparse_slice_C))

TensorFlow 2.4 以降を使用している場合は、スパーステンソルのゼロ以外の値に対する要素ごとの演算に tf.sparse.map_values を使用します。

In [ ]:

st2_plus_5 = tf.sparse.map_values(tf.add, st2, 5)
print(tf.sparse.to_dense(st2_plus_5))

ゼロ以外の値のみが変更されたことに注意してください。ゼロ値はゼロのままです。

同様に、TensorFlow の以前のバージョンでは、以下の設計パターンに従います。

In [ ]:

st2_plus_5 = tf.sparse.SparseTensor(
    st2.indices,
    st2.values + 5,
    st2.dense_shape)
print(tf.sparse.to_dense(st2_plus_5))

他の TensorFlow API で `tf.sparse.SparseTensor` を使用する

スパーステンソルは、これらの TensorFlow API で透過的に動作します。

tf.keras
tf.data
tf.Train.Example protobuf
tf.function
tf.while_loop
tf.cond
tf.identity
tf.cast
tf.print
tf.saved_model
tf.io.serialize_sparse
tf.io.serialize_many_sparse
tf.io.deserialize_many_sparse
tf.math.abs
tf.math.negative
tf.math.sign
tf.math.square
tf.math.sqrt
tf.math.erf
tf.math.tanh
tf.math.bessel_i0e
tf.math.bessel_i1e

上記の API のいくつかの例を以下に示します。

`tf.keras`

tf.keras API のサブセットは、コストが高いキャストや変換演算なしでスパーステンソルをサポートします。Keras API を使用すると、スパーステンソルを入力として Keras モデルに渡すことができます。tf.keras.Input または tf.keras.layers.InputLayer を呼び出すときには sparse=True を設定します。 Keras レイヤー間でスパーステンソルを渡し、Keras モデルがそれらを出力として返すようにすることもできます。モデルの tf.keras.layers.Dense レイヤーでスパーステンソルを使用すると、密テンソルが出力されます。

以下の例は、スパース入力をサポートするレイヤーのみを使用する場合に、スパーステンソルを入力として Keras モデルに渡す方法を示しています。

In [ ]:

x = tf.keras.Input(shape=(4,), sparse=True)
y = tf.keras.layers.Dense(4)(x)
model = tf.keras.Model(x, y)

sparse_data = tf.sparse.SparseTensor(
    indices = [(0,0),(0,1),(0,2),
               (4,3),(5,0),(5,1)],
    values = [1,1,1,1,1,1],
    dense_shape = (6,4)
)

model(sparse_data)

model.predict(sparse_data)

`tf.data`

tf.data は、単純で再利用可能な部分から複雑な入力パイプラインを構築できる API です。主なデータ構造は tf.data.Dataset で、一連の要素を表し、その各要素には 1 つ以上の成分が含まれます。

スパーステンソルを使用したデータセットの構築

tf.data.Dataset.from_tensor_slices など、tf.Tensor または NumPy 配列からデータセットを構築するために使用されるメソッドを使用して、スパーステンソルからデータセットを構築します。この演算は、データのスパース性を保持します。

In [ ]:

dataset = tf.data.Dataset.from_tensor_slices(sparse_data)
for element in dataset: 
  print(pprint_sparse_tensor(element))

スパーステンソルを使用したデータセットのバッチ処理とバッチ処理解除

スパーステンソルを使用してデータセットをバッチ処理 (連続する要素を 1 つの要素に結合) およびバッチ処理解除できます。バッチ処理には Dataset.batch メソッド、バッチ処理解除には Dataset.unbatch メソッドを使用します。

In [ ]:

batched_dataset = dataset.batch(2)
for element in batched_dataset:
  print (pprint_sparse_tensor(element))

In [ ]:

unbatched_dataset = batched_dataset.unbatch()
for element in unbatched_dataset:
  print (pprint_sparse_tensor(element))

また、tf.data.experimental.dense_to_sparse_batch を使用して、さまざまな形状のデータセット要素をスパーステンソルにバッチ処理することもできます。

スパーステンソルを使用したデータセットの変換

Dataset.map を使用して、データセット内のスパーステンソルを変換および作成します。

In [ ]:

transform_dataset = dataset.map(lambda x: x*2)
for i in transform_dataset:
  print(pprint_sparse_tensor(i))

tf.train.Example

tf.train.Example は、TensorFlow データの標準の protobuf エンコーディングです。tf.train.Example でスパーステンソルを使用すると、次のことができます。

tf.io.VarLenFeature を使用して、可変長データを tf.sparse.SparseTensor に読み込みますが、代わりに tf.io.RaggedFeature の使用を検討する必要があります。
tf.io.SparseFeature を使用して任意のスパースデータを tf.sparse.SparseTensor に読み込みます。これは 3 つの個別の特徴キーを使用して indices、values、および dense_shape を格納します。

`tf.function`

tf.function は、Python 関数向けに TensorFlow グラフを事前計算するデコレータで、TensorFlow コードのパフォーマンスを大幅に向上できます。スパーステンソルは、tf.function と concrete functions の両方で透過的に動作します。

In [ ]:

@tf.function
def f(x,y):
  return tf.sparse.sparse_dense_matmul(x,y)

a = tf.sparse.SparseTensor(indices=[[0, 3], [2, 4]],
                    values=[15, 25],
                    dense_shape=[3, 10])

b = tf.sparse.to_dense(tf.sparse.transpose(a))

c = f(a,b)

print(c)

欠損値とゼロ値の区別

tf.sparse.SparseTensor のほとんどの演算は、欠損値と明示的なゼロ値を同じように扱います。これは設計によるもので、tf.sparse.SparseTensor は密なテンソルのように振る舞います。

ただし、ゼロ値と欠損値を区別することが役立つ場合がいくつかあります。特に、トレーニングデータ内の欠落/不明なデータをエンコードする場合に便利です。たとえば、スコアのテンソル (-Inf から +Inf までの任意の浮動小数点値を持つことができる) があり、スコアが欠落しているユースケースを考えてみます。このテンソルは、明示的なゼロが既知のゼロスコアであるスパーステンソルを使用してエンコードできますが、暗黙的なゼロ値は実際には欠損データを表し、ゼロではありません。

注意: これは一般的に、tf.sparse.SparseTensor の意図した使用法ではないので、既知/未知の値の位置を識別する別のマスクテンソルを使用するなど、これをエンコードするための他の手法も検討してみてください。ただし、ほとんどのスパース演算は明示的ゼロ値と暗黙的ゼロ値を同じように扱うため、このアプローチを使用するときは注意してください。

tf.sparse.reduce_max のような一部の演算は、欠損値をゼロであるかのように扱わないことに注意してください。たとえば、以下のコードブロックを実行すると、期待される出力は 0 になります。ただし、この例外のため、出力は -3 になります。

In [ ]:

print(tf.sparse.reduce_max(tf.sparse.from_dense([-5, 0, -3])))

対照的に、密テンソルに tf.math.reduce_max を適用すると、出力は予想どおり 0 になります。

In [ ]:

print(tf.math.reduce_max([-5, 0, -3]))

その他の資料とリソース

テンソルについての詳細は、テンソルガイドを参照してください。
不均一なデータを扱うテンソルの一種である不規則テンソルの使用方法については、不規則テンソルガイドを参照してください。
tf.Example データデコーダでスパーステンソルを使用する TensorFlow Model Garden のオブジェクト検出モデルをご確認ください。

Copyright 2020 The TensorFlow Authors.

スパーステンソルの使用

TensorFlow のスパーステンソル

`tf.sparse.SparseTensor` の構築

スパーステンソルの操作

他の TensorFlow API で `tf.sparse.SparseTensor` を使用する

`tf.keras`

`tf.data`

スパーステンソルを使用したデータセットの構築

スパーステンソルを使用したデータセットのバッチ処理とバッチ処理解除

スパーステンソルを使用したデータセットの変換

tf.train.Example

`tf.function`

欠損値とゼロ値の区別

その他の資料とリソース

Product

Resources

Company

Copyright 2020 The TensorFlow Authors.

スパーステンソルの使用

TensorFlow のスパーステンソル

tf.sparse.SparseTensor の構築

スパーステンソルの操作

他の TensorFlow API で tf.sparse.SparseTensor を使用する

tf.keras

tf.data

スパーステンソルを使用したデータセットの構築

スパーステンソルを使用したデータセットのバッチ処理とバッチ処理解除

スパーステンソルを使用したデータセットの変換

tf.train.Example

tf.function

欠損値とゼロ値の区別

その他の資料とリソース

`tf.sparse.SparseTensor` の構築

他の TensorFlow API で `tf.sparse.SparseTensor` を使用する

`tf.keras`

`tf.data`

`tf.function`