Path: blob/main/ch16/bonus-distilbert-lightning-trainer/distilbert_finetuning-full.ipynb
1247 views
Finetuning a DistilBERT Classifier Using the Lightning Trainer
1 Loading the Dataset
The IMDB movie review dataset consists of 50k movie reviews with sentiment label (0: negative, 1: positive).
1a) Load from datasets
Hub
1b) Load from local directory
The IMDB movie review set can be downloaded from http://ai.stanford.edu/~amaas/data/sentiment/. After downloading the dataset, decompress the files.
A) If you are working with Linux or MacOS X, open a new terminal windowm cd into the download directory and execute
B) If you are working with Windows, download an archiver such as 7Zip to extract the files from the download archive.
C) Use the following code to download and unzip the dataset via Python
Download the movie reviews
Convert them to a pandas DataFrame and save them as CSV
Basic datasets analysis and sanity checks
Split data into training, validation, and test sets
2 Tokenization and Numericalization
Load the dataset via load_dataset
Tokenize the dataset
3 Set Up DataLoaders
4 Initializing DistilBERT
5 Finetuning
Wrap in LightningModule for Training