Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
rasbt
GitHub Repository: rasbt/machine-learning-book
Path: blob/main/ch16/ch16-part3-bert.ipynb
1245 views
Kernel: Python 3 (ipykernel)

Machine Learning with PyTorch and Scikit-Learn

-- Code Examples

Package version checks

Add folder to path in order to load from the check_packages.py script:

import sys sys.path.insert(0, '..')

Check recommended package versions:

from python_environment_check import check_packages d = { 'pandas': '1.3.2', 'torch': '1.9.0', 'torchtext': '0.11.0', 'datasets': '1.11.0', 'transformers': '4.9.1', } check_packages(d)
[OK] Your Python version is 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] [OK] pandas 1.3.5 [OK] torch 1.10.0 [OK] torchtext 0.11.0 [OK] datasets 1.11.0 [OK] transformers 4.9.1

Chapter 16: Transformers – Improving Natural Language Processing with Attention Mechanisms (Part 3/3)


Quote from https://huggingface.co/transformers/custom_datasets.html:

DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than bert-base-uncased , runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark.


from IPython.display import Image

Fine-tuning a BERT model in PyTorch

Loading the IMDb movie review dataset

import gzip import shutil import time import pandas as pd import requests import torch import torch.nn.functional as F import torchtext import transformers from transformers import DistilBertTokenizerFast from transformers import DistilBertForSequenceClassification

General Settings

torch.backends.cudnn.deterministic = True RANDOM_SEED = 123 torch.manual_seed(RANDOM_SEED) DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu') NUM_EPOCHS = 3

Download Dataset

The following cells will download the IMDB movie review dataset (http://ai.stanford.edu/~amaas/data/sentiment/) for positive-negative sentiment classification in as CSV-formatted file:

url = "https://github.com/rasbt/machine-learning-book/raw/main/ch08/movie_data.csv.gz" filename = url.split("/")[-1] with open(filename, "wb") as f: r = requests.get(url) f.write(r.content) with gzip.open('movie_data.csv.gz', 'rb') as f_in: with open('movie_data.csv', 'wb') as f_out: shutil.copyfileobj(f_in, f_out)

Check that the dataset looks okay:

df = pd.read_csv('movie_data.csv') df.head()
df.shape
(50000, 2)

Split Dataset into Train/Validation/Test

train_texts = df.iloc[:35000]['review'].values train_labels = df.iloc[:35000]['sentiment'].values valid_texts = df.iloc[35000:40000]['review'].values valid_labels = df.iloc[35000:40000]['sentiment'].values test_texts = df.iloc[40000:]['review'].values test_labels = df.iloc[40000:]['sentiment'].values

Tokenizing the dataset

tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
train_encodings = tokenizer(list(train_texts), truncation=True, padding=True) valid_encodings = tokenizer(list(valid_texts), truncation=True, padding=True) test_encodings = tokenizer(list(test_texts), truncation=True, padding=True)
train_encodings[0]
Encoding(num_tokens=512, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])

Dataset Class and Loaders

class IMDbDataset(torch.utils.data.Dataset): def __init__(self, encodings, labels): self.encodings = encodings self.labels = labels def __getitem__(self, idx): item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()} item['labels'] = torch.tensor(self.labels[idx]) return item def __len__(self): return len(self.labels) train_dataset = IMDbDataset(train_encodings, train_labels) valid_dataset = IMDbDataset(valid_encodings, valid_labels) test_dataset = IMDbDataset(test_encodings, test_labels)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=16, shuffle=True) valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=16, shuffle=False) test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=16, shuffle=False)

Loading and fine-tuning a pre-trained BERT model

model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased') model.to(DEVICE) model.train() optim = torch.optim.Adam(model.parameters(), lr=5e-5)
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.bias', 'vocab_transform.weight', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_layer_norm.weight'] - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'classifier.weight', 'classifier.bias', 'pre_classifier.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Train Model -- Manual Training Loop

def compute_accuracy(model, data_loader, device): with torch.no_grad(): correct_pred, num_examples = 0, 0 for batch_idx, batch in enumerate(data_loader): ### Prepare data input_ids = batch['input_ids'].to(device) attention_mask = batch['attention_mask'].to(device) labels = batch['labels'].to(device) outputs = model(input_ids, attention_mask=attention_mask) logits = outputs['logits'] predicted_labels = torch.argmax(logits, 1) num_examples += labels.size(0) correct_pred += (predicted_labels == labels).sum() return correct_pred.float()/num_examples * 100
start_time = time.time() for epoch in range(NUM_EPOCHS): model.train() for batch_idx, batch in enumerate(train_loader): ### Prepare data input_ids = batch['input_ids'].to(DEVICE) attention_mask = batch['attention_mask'].to(DEVICE) labels = batch['labels'].to(DEVICE) ### Forward outputs = model(input_ids, attention_mask=attention_mask, labels=labels) loss, logits = outputs['loss'], outputs['logits'] ### Backward optim.zero_grad() loss.backward() optim.step() ### Logging if not batch_idx % 250: print (f'Epoch: {epoch+1:04d}/{NUM_EPOCHS:04d} | ' f'Batch {batch_idx:04d}/{len(train_loader):04d} | ' f'Loss: {loss:.4f}') model.eval() with torch.set_grad_enabled(False): print(f'Training accuracy: ' f'{compute_accuracy(model, train_loader, DEVICE):.2f}%' f'\nValid accuracy: ' f'{compute_accuracy(model, valid_loader, DEVICE):.2f}%') print(f'Time elapsed: {(time.time() - start_time)/60:.2f} min') print(f'Total Training Time: {(time.time() - start_time)/60:.2f} min') print(f'Test accuracy: {compute_accuracy(model, test_loader, DEVICE):.2f}%')
Epoch: 0001/0003 | Batch 0000/2188 | Loss: 0.6771 Epoch: 0001/0003 | Batch 0250/2188 | Loss: 0.3006 Epoch: 0001/0003 | Batch 0500/2188 | Loss: 0.3678 Epoch: 0001/0003 | Batch 0750/2188 | Loss: 0.1487 Epoch: 0001/0003 | Batch 1000/2188 | Loss: 0.6674 Epoch: 0001/0003 | Batch 1250/2188 | Loss: 0.3264 Epoch: 0001/0003 | Batch 1500/2188 | Loss: 0.4358 Epoch: 0001/0003 | Batch 1750/2188 | Loss: 0.2579 Epoch: 0001/0003 | Batch 2000/2188 | Loss: 0.2474 Training accuracy: 96.32% Valid accuracy: 92.34% Time elapsed: 20.67 min Epoch: 0002/0003 | Batch 0000/2188 | Loss: 0.0850 Epoch: 0002/0003 | Batch 0250/2188 | Loss: 0.3433 Epoch: 0002/0003 | Batch 0500/2188 | Loss: 0.0793 Epoch: 0002/0003 | Batch 0750/2188 | Loss: 0.0061 Epoch: 0002/0003 | Batch 1000/2188 | Loss: 0.1536 Epoch: 0002/0003 | Batch 1250/2188 | Loss: 0.0816 Epoch: 0002/0003 | Batch 1500/2188 | Loss: 0.0786 Epoch: 0002/0003 | Batch 1750/2188 | Loss: 0.1395 Epoch: 0002/0003 | Batch 2000/2188 | Loss: 0.0344 Training accuracy: 98.35% Valid accuracy: 92.46% Time elapsed: 41.41 min Epoch: 0003/0003 | Batch 0000/2188 | Loss: 0.0403 Epoch: 0003/0003 | Batch 0250/2188 | Loss: 0.0036 Epoch: 0003/0003 | Batch 0500/2188 | Loss: 0.0156 Epoch: 0003/0003 | Batch 0750/2188 | Loss: 0.0114 Epoch: 0003/0003 | Batch 1000/2188 | Loss: 0.1227 Epoch: 0003/0003 | Batch 1250/2188 | Loss: 0.0125 Epoch: 0003/0003 | Batch 1500/2188 | Loss: 0.0074 Epoch: 0003/0003 | Batch 1750/2188 | Loss: 0.0202 Epoch: 0003/0003 | Batch 2000/2188 | Loss: 0.0746 Training accuracy: 99.08% Valid accuracy: 91.84% Time elapsed: 62.15 min Total Training Time: 62.15 min Test accuracy: 92.50%
del model # free memory

Fine-tuning a transformer more conveniently using the Trainer API

Reload pretrained model:

model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased') model.to(DEVICE) model.train();
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.bias', 'vocab_transform.weight', 'vocab_projector.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_layer_norm.weight'] - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'classifier.weight', 'classifier.bias', 'pre_classifier.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
from transformers import Trainer, TrainingArguments optim = torch.optim.Adam(model.parameters(), lr=5e-5) training_args = TrainingArguments( output_dir='./results', num_train_epochs=3, per_device_train_batch_size=16, per_device_eval_batch_size=16, logging_dir='./logs', logging_steps=10, ) trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, )
# install dataset via pip install datasets from datasets import load_metric import numpy as np metric = load_metric("accuracy") def compute_metrics(eval_pred): logits, labels = eval_pred # logits are a numpy array, not pytorch tensor predictions = np.argmax(logits, axis=-1) return metric.compute( predictions=predictions, references=labels)
optim = torch.optim.Adam(model.parameters(), lr=5e-5) training_args = TrainingArguments( output_dir='./results', num_train_epochs=3, per_device_train_batch_size=16, per_device_eval_batch_size=16, logging_dir='./logs', logging_steps=10 ) trainer = Trainer( model=model, compute_metrics=compute_metrics, args=training_args, train_dataset=train_dataset, eval_dataset=test_dataset, optimizers=(optim, None) # optimizer and learning rate scheduler ) # force model to only use 1 GPU (even if multiple are availabe) # to compare more fairly to previous code trainer.args._n_gpu = 1
PyTorch: setting up devices The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
start_time = time.time() trainer.train() print(f'Total Training Time: {(time.time() - start_time)/60:.2f} min')
***** Running training ***** Num examples = 35000 Num Epochs = 3 Instantaneous batch size per device = 16 Total train batch size (w. parallel, distributed & accumulation) = 16 Gradient Accumulation steps = 1 Total optimization steps = 6564
Saving model checkpoint to ./results/checkpoint-500 Configuration saved in ./results/checkpoint-500/config.json Model weights saved in ./results/checkpoint-500/pytorch_model.bin Saving model checkpoint to ./results/checkpoint-1000 Configuration saved in ./results/checkpoint-1000/config.json Model weights saved in ./results/checkpoint-1000/pytorch_model.bin Saving model checkpoint to ./results/checkpoint-1500 Configuration saved in ./results/checkpoint-1500/config.json Model weights saved in ./results/checkpoint-1500/pytorch_model.bin Saving model checkpoint to ./results/checkpoint-2000 Configuration saved in ./results/checkpoint-2000/config.json Model weights saved in ./results/checkpoint-2000/pytorch_model.bin Saving model checkpoint to ./results/checkpoint-2500 Configuration saved in ./results/checkpoint-2500/config.json Model weights saved in ./results/checkpoint-2500/pytorch_model.bin Saving model checkpoint to ./results/checkpoint-3000 Configuration saved in ./results/checkpoint-3000/config.json Model weights saved in ./results/checkpoint-3000/pytorch_model.bin Saving model checkpoint to ./results/checkpoint-3500 Configuration saved in ./results/checkpoint-3500/config.json Model weights saved in ./results/checkpoint-3500/pytorch_model.bin Saving model checkpoint to ./results/checkpoint-4000 Configuration saved in ./results/checkpoint-4000/config.json Model weights saved in ./results/checkpoint-4000/pytorch_model.bin Saving model checkpoint to ./results/checkpoint-4500 Configuration saved in ./results/checkpoint-4500/config.json Model weights saved in ./results/checkpoint-4500/pytorch_model.bin Saving model checkpoint to ./results/checkpoint-5000 Configuration saved in ./results/checkpoint-5000/config.json Model weights saved in ./results/checkpoint-5000/pytorch_model.bin Saving model checkpoint to ./results/checkpoint-5500 Configuration saved in ./results/checkpoint-5500/config.json Model weights saved in ./results/checkpoint-5500/pytorch_model.bin Saving model checkpoint to ./results/checkpoint-6000 Configuration saved in ./results/checkpoint-6000/config.json Model weights saved in ./results/checkpoint-6000/pytorch_model.bin Saving model checkpoint to ./results/checkpoint-6500 Configuration saved in ./results/checkpoint-6500/config.json Model weights saved in ./results/checkpoint-6500/pytorch_model.bin Training completed. Do not forget to share your model on huggingface.co/models =)
Total Training Time: 45.36 min
trainer.evaluate()
***** Running Evaluation ***** Num examples = 10000 Batch size = 16
{'eval_loss': 0.30534815788269043, 'eval_accuracy': 0.9327, 'eval_runtime': 87.1161, 'eval_samples_per_second': 114.789, 'eval_steps_per_second': 7.174, 'epoch': 3.0}
model.eval() model.to(DEVICE) print(f'Test accuracy: {compute_accuracy(model, test_loader, DEVICE):.2f}%')
Test accuracy: 93.27%

...


Readers may ignore the next cell.

! python ../.convert_notebook_to_script.py --input ch16-part3-bert.ipynb --output ch16-part3-bert.py
[NbConvertApp] WARNING | Config option `kernel_spec_manager_class` not recognized by `NbConvertApp`. [NbConvertApp] Converting notebook ch16-part3-bert.ipynb to script [NbConvertApp] Writing 9089 bytes to ch16-part3-bert.py