Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
rasbt
GitHub Repository: rasbt/machine-learning-book
Path: blob/main/ch16/README.md
1245 views

Chapter 16: Transformers – Improving Natural Language Processing with Attention Mechanisms

Chapter Outline

  • Adding an attention mechanism to RNNs

    • Attention helps RNNs with accessing information

    • The original attention mechanism for RNNs

    • Processing the inputs using a bidirectional RNN

    • Generating outputs from context vectors

    • Computing the attention weights

  • Introducing the self-attention mechanism

    • Starting with a basic form of self-attention

    • Parameterizing the self-attention mechanism: scaled dot-product attention

  • Attention is all we need: introducing the original transformer architecture

    • Encoding context embeddings via multi-head attention

    • Learning a language model: decoder and masked multi-head attention

    • Implementation details: positional encodings and layer normalization

  • Building large-scale language models by leveraging unlabeled data

    • Pre-training and fine-tuning transformer models

    • Leveraging unlabeled data with GPT

    • Using GPT-2 to generate new text

    • Bidirectional pre-training with BERT

    • The best of both worlds: BART

  • Fine-tuning a BERT model in PyTorch

    • Loading the IMDb movie review dataset

    • Tokenizing the dataset

    • Loading and fine-tuning a pre-trained BERT model

    • Fine-tuning a transformer more conveniently using the Trainer API

  • Summary

Please refer to the README.md file in ../ch01 for more information about running the code examples.