GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/pt-br/hub/tutorials/bert_experts.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2020 The TensorFlow Hub Authors.

Licensed under the Apache License, Version 2.0 (the "License");

In [ ]:

#@title Copyright 2020 The TensorFlow Hub Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

Ver em TensorFlow.org

Executar no Google Colab

Ver no GitHub

Baixar notebook

Ver modelos do TF Hub

Especialistas em BERT do TF Hub

Este Colab demonstra como:

Carregar modelos BERT do TensorFlow Hub que foram treinados com diferentes tarefas, incluindo MNLI, SQuAD e PubMed
Usar um modelo de pré-processamento correspondente para tokenizar texto bruto e convertê-lo em IDs
Gerar a saída de frases combinadas a partir dos IDs de entrada de token usando o modelo carregado
Avaliar a similaridade semântica das saídas das diferentes frases combinadas

Observação: este Colab deve ser executado em um runtime com GPU

Configuração e importações

In [ ]:

!pip install --quiet "tensorflow-text==2.11.*"

In [ ]:

import seaborn as sns
from sklearn.metrics import pairwise

import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text as text  # Imports TF ops for preprocessing.

In [ ]:

#@title Configure the model { run: "auto" }
BERT_MODEL = "https://tfhub.dev/google/experts/bert/wiki_books/2" # @param {type: "string"} ["https://tfhub.dev/google/experts/bert/wiki_books/2", "https://tfhub.dev/google/experts/bert/wiki_books/mnli/2", "https://tfhub.dev/google/experts/bert/wiki_books/qnli/2", "https://tfhub.dev/google/experts/bert/wiki_books/qqp/2", "https://tfhub.dev/google/experts/bert/wiki_books/squad2/2", "https://tfhub.dev/google/experts/bert/wiki_books/sst2/2",  "https://tfhub.dev/google/experts/bert/pubmed/2", "https://tfhub.dev/google/experts/bert/pubmed/squad2/2"]
# Preprocessing must match the model, but all the above use the same.
PREPROCESS_MODEL = "https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3"

Frases

Vamos pegar algumas frases em inglês da Wikipedia para usar no modelo:

In [ ]:

sentences = [
  "Here We Go Then, You And I is a 1999 album by Norwegian pop artist Morten Abel. It was Abel's second CD as a solo artist.",
  "The album went straight to number one on the Norwegian album chart, and sold to double platinum.",
  "Among the singles released from the album were the songs \"Be My Lover\" and \"Hard To Stay Awake\".",
  "Riccardo Zegna is an Italian jazz musician.",
  "Rajko Maksimović is a composer, writer, and music pedagogue.",
  "One of the most significant Serbian composers of our time, Maksimović has been and remains active in creating works for different ensembles.",
  "Ceylon spinach is a common name for several plants and may refer to: Basella alba Talinum fruticosum",
  "A solar eclipse occurs when the Moon passes between Earth and the Sun, thereby totally or partly obscuring the image of the Sun for a viewer on Earth.",
  "A partial solar eclipse occurs in the polar regions of the Earth when the center of the Moon's shadow misses the Earth.",
]

Execute o modelo

Vamos carregar o modelo BERT a partir do TF Hub, tokenizar as frases usando o modelo de pré-processamento correspondente do TF Hub e depois alimentar as frases tokenizadas no modelo. Para que este Colab seja rápido e simples, recomendamos executar em GPU.

Acesse Runtime → Change runtime type (Alterar tipo de runtime) para confirmar se a opção GPU está selecionada.

In [ ]:

preprocess = hub.load(PREPROCESS_MODEL)
bert = hub.load(BERT_MODEL)
inputs = preprocess(sentences)
outputs = bert(inputs)

In [ ]:

print("Sentences:")
print(sentences)

print("\nBERT inputs:")
print(inputs)

print("\nPooled embeddings:")
print(outputs["pooled_output"])

print("\nPer token embeddings:")
print(outputs["sequence_output"])

Similaridade semântica

Agora, vamos avaliar os embeddings pooled_output das frases e comparar a similaridade deles entre as frases.

In [ ]:

#@title Helper functions

def plot_similarity(features, labels):
  """Plot a similarity matrix of the embeddings."""
  cos_sim = pairwise.cosine_similarity(features)
  sns.set(font_scale=1.2)
  cbar_kws=dict(use_gridspec=False, location="left")
  g = sns.heatmap(
      cos_sim, xticklabels=labels, yticklabels=labels,
      vmin=0, vmax=1, cmap="Blues", cbar_kws=cbar_kws)
  g.tick_params(labelright=True, labelleft=False)
  g.set_yticklabels(labels, rotation=0)
  g.set_title("Semantic Textual Similarity")

In [ ]:

plot_similarity(outputs["pooled_output"], sentences)

Saiba mais

Confira mais modelos BERT no TensorFlow Hub
Este notebook demonstra uma inferência simples com BERT. Confira um tutorial mais avançado sobre como fazer ajustes finos no BERT em tensorflow.org/official_models/fine_tuning_bert
Usamos apenas um chip de GPU para executar o modelo. Saiba mais sobre como carregar modelos usando tf.distribute em tensorflow.org/tutorials/distribute/save_and_load