Path: blob/master/Generative NLP Models using Python/6.1 Transformer Models.ipynb
7216 views
#PyTorch is working
#But GPU acceleration is not being used (CUDA: false)
Simplified Python code demonstrating how self-attention might be applied in a neural machine translation scenario using the transformers library
This code uses the BERT model from the transformers library to tokenize the input text, compute its hidden states, and extract the self-attention weights. These weights indicate how much each token attends to every other token in each layer of the model. However, note that BERT is not specifically trained for machine translation, so this is just an illustration of self-attention in a language model context.
Steps
Tokenization: The input text "Ashi is beautiful." is tokenized into its constituent tokens using the BERT tokenizer. Each token is represented by an integer ID. Let's denote the tokenized input as 𝑋 X.
Model Computation: The tokenized input 𝑋 X is fed into the BERT model, which consists of multiple layers of self-attention and feedforward neural networks. The BERT model processes the input tokens and produces hidden states for each token. Let's denote the hidden states as 𝐻
Self-Attention: During each layer of the BERT model, self-attention is applied to the input tokens. The self-attention mechanism computes attention scores between each token and every other token in the sequence. These attention scores are calculated using the formula:
Self-Attention Weights: The self-attention weights represent the importance of each token attending to every other token in the sequence. These weights are computed for each layer of the model. In the code, the mean of the attention weights across the sequence dimension is calculated for each layer and printed out.
C:\Users\Suyashi144893\AppData\Local\anaconda3\envs\torch_fix\lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Loading weights: 100%|████████████████████████████████████████████████████████████| 199/199 [00:00<00:00, 10694.00it/s]
BertModel LOAD REPORT from: bert-base-multilingual-cased
Key | Status | |
-------------------------------------------+------------+--+-
cls.predictions.transform.dense.bias | UNEXPECTED | |
cls.predictions.transform.LayerNorm.weight | UNEXPECTED | |
cls.predictions.transform.dense.weight | UNEXPECTED | |
cls.predictions.bias | UNEXPECTED | |
cls.seq_relationship.weight | UNEXPECTED | |
cls.predictions.transform.LayerNorm.bias | UNEXPECTED | |
cls.seq_relationship.bias | UNEXPECTED | |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Langauge Translation
Multi Langauge Translation Model
Improvement with Gradio App
Switching to a single multilingual model is the right architecture for scalability and avoids managing multiple models.
Implementation using Facebook M2M100, which supports 100+ languages.
Language Mapping
languages = { "English": "en", "Hindi": "hi", "French": "fr", "German": "de", "Spanish": "es", "Chinese (Simplified)": "zh", "Arabic": "ar", "Russian": "ru", "Portuguese": "pt", "Italian": "it", "Japanese": "ja", "Korean": "ko", "Dutch": "nl", "Turkish": "tr", "Polish": "pl", "Swedish": "sv", "Danish": "da", "Finnish": "fi", "Greek": "el", "Czech": "cs", "Hungarian": "hu", "Romanian": "ro", "Ukrainian": "uk", "Thai": "th", "Vietnamese": "vi", "Indonesian": "id", "Malay": "ms", "Bengali": "bn", "Tamil": "ta", "Telugu": "te", "Marathi": "mr", "Gujarati": "gu", "Punjabi": "pa", "Urdu": "ur" }
using Gradio
Full Multi-Task NLP App
Loading weights: 100%|█████████████████████████████████████████████████████████████| 104/104 [00:00<00:00, 2896.18it/s]
Loading weights: 100%|█████████████████████████████████████████████████████████████| 199/199 [00:00<00:00, 2355.36it/s]
BertForTokenClassification LOAD REPORT from: dslim/bert-base-NER
Key | Status | |
-------------------------+------------+--+-
bert.pooler.dense.bias | UNEXPECTED | |
bert.pooler.dense.weight | UNEXPECTED | |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[10], line 18
7 sentiment = pipeline(
8 "sentiment-analysis",
9 model="distilbert-base-uncased-finetuned-sst-2-english"
10 )
12 ner = pipeline(
13 "ner",
14 model="dslim/bert-base-NER",
15 aggregation_strategy="simple"
16 )
---> 18 qa = pipeline(
19 "question-answering",
20 model="distilbert-base-cased-distilled-squad"
21 )
23 summarizer = pipeline(
24 "summarization",
25 model="t5-small"
26 )
28 text2text = pipeline(
29 "text2text-generation",
30 model="t5-small"
31 )
, in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, processor, revision, use_fast, token, device, device_map, dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
735 pipeline_class = get_class_from_dynamic_module(
736 class_ref,
737 model,
738 code_revision=code_revision,
739 **hub_kwargs,
740 )
741 else:
--> 742 normalized_task, targeted_task, task_options = check_task(task)
743 if pipeline_class is None:
744 pipeline_class = targeted_task["impl"]
, in check_task(task)
319 def check_task(task: str) -> tuple[str, dict, Any]:
320 """
321 Checks an incoming task string, to validate it's correct and return the default Pipeline and Model classes, and
322 default models if they exist.
(...)
353
354 """
--> 355 return PIPELINE_REGISTRY.check_task(task)
, in PipelineRegistry.check_task(self, task)
1338 targeted_task = self.supported_tasks[task]
1339 return task, targeted_task, None
-> 1341 raise KeyError(f"Unknown task {task}, available tasks are {self.get_supported_tasks()}")
KeyError: "Unknown task question-answering, available tasks are ['any-to-any', 'audio-classification', 'automatic-speech-recognition', 'depth-estimation', 'document-question-answering', 'feature-extraction', 'fill-mask', 'image-classification', 'image-feature-extraction', 'image-segmentation', 'image-text-to-text', 'keypoint-matching', 'mask-generation', 'ner', 'object-detection', 'sentiment-analysis', 'table-question-answering', 'text-classification', 'text-generation', 'text-to-audio', 'text-to-speech', 'token-classification', 'video-classification', 'zero-shot-audio-classification', 'zero-shot-classification', 'zero-shot-image-classification', 'zero-shot-object-detection']"