Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
aswintechguy
GitHub Repository: aswintechguy/Deep-Learning-Projects
Path: blob/main/LLM Usage and Fine Tuning Llama3 - Unsloth/LLM Usage and Fine Tuning Llama3 - Unsloth.ipynb
569 views
Kernel: Python 3

Install Dependencies

!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" -q !pip install --no-deps "trl<0.9.0" peft accelerate bitsandbytes xformers datasets -q
Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 103.4/103.4 kB 3.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.3/9.3 MB 34.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 547.8/547.8 kB 44.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 52.2 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 MB 15.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 18.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB 9.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 23.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 19.7 MB/s eta 0:00:00 Building wheel for unsloth (pyproject.toml) ... done ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. cudf-cu12 24.4.1 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 16.1.0 which is incompatible. google-colab 1.0.0 requires requests==2.31.0, but you have requests 2.32.3 which is incompatible. ibis-framework 8.0.0 requires pyarrow<16,>=2, but you have pyarrow 16.1.0 which is incompatible. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 245.2/245.2 kB 5.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 251.6/251.6 kB 10.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 314.1/314.1 kB 9.7 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 119.8/119.8 MB 9.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 222.7/222.7 MB 5.3 MB/s eta 0:00:00

LLM Inference

from unsloth import FastLanguageModel import torch max_seq_length = 2048 dtype = None load_in_4bit = True model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/llama-3-8b-bnb-4bit", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit ) FastLanguageModel.for_inference(model)
config.json: 0%| | 0.00/1.20k [00:00<?, ?B/s]
==((====))== Unsloth: Fast Llama patching release 2024.7 \\ /| GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux. O^O/ \_/ \ Pytorch: 2.3.0+cu121. CUDA = 7.5. CUDA Toolkit = 12.1. \ / Bfloat16 = FALSE. FA [Xformers = 0.0.26.post1. FA2 = False] "-____-" Free Apache license: http://github.com/unslothai/unsloth
model.safetensors: 0%| | 0.00/5.70G [00:00<?, ?B/s]
generation_config.json: 0%| | 0.00/172 [00:00<?, ?B/s]
tokenizer_config.json: 0%| | 0.00/50.6k [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/9.09M [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/464 [00:00<?, ?B/s]
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {} ### Input: {} ### Response: {}"""
instruction = "You are a helpful assistant who can answer questions" input = "Who developed GPT models" # process the input inputs = tokenizer([alpaca_prompt.format(instruction, input, "")], return_tensors='pt').to('cuda') outputs = model.generate(**inputs, max_new_tokens=100) response = tokenizer.batch_decode(outputs)[0] print(response)
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: You are a helpful assistant who can answer questions ### Input: Who developed GPT models ### Response: OpenAI developed GPT models.<|end_of_text|>
instruction = "You are a helpful assistant who can answer questions" input = "Explain about Transformers in AI?" # process the input inputs = tokenizer([alpaca_prompt.format(instruction, input, "")], return_tensors='pt').to('cuda') outputs = model.generate(**inputs, max_new_tokens=100, temperature = 0.1) response = tokenizer.batch_decode(outputs)[0] print(response)
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: You are a helpful assistant who can answer questions ### Input: Explain about Transformers in AI? ### Response: Transformers are a type of artificial intelligence (AI) that uses a neural network to learn patterns in data. They are used in a variety of applications, including natural language processing, computer vision, and speech recognition. Transformers are able to learn complex patterns in data by using a neural network to process the data in a way that is similar to how the human brain processes information. This allows them to learn patterns in data that are too complex for traditional machine learning algorithms to handle.<|end_of_text|>

Fine Tuning

model = FastLanguageModel.get_peft_model( model, r = 16, lora_alpha = 16, target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], lora_dropout = 0, bias = "none", use_gradient_checkpointing = True, random_state = 42, max_seq_length = max_seq_length )
Unsloth 2024.7 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {} ### Input: {} ### Response: {}"""
def format_input_prompt(examples): # get the list with keys instructions = examples['instruction'] inputs = examples['input'] outputs = examples['output'] texts = [] for instruction, input, output in zip(instructions, inputs, outputs): # format the input prompt text = alpaca_prompt.format(instruction, input, output) texts.append(text) return {"text": texts}
# import the dataset from datasets import load_dataset dataset = load_dataset("yahma/alpaca-cleaned", split='train') dataset = dataset.map(format_input_prompt, batched=True)
Downloading readme: 0%| | 0.00/11.6k [00:00<?, ?B/s]
Downloading data: 0%| | 0.00/44.3M [00:00<?, ?B/s]
Generating train split: 0%| | 0/51760 [00:00<?, ? examples/s]
Map: 0%| | 0/51760 [00:00<?, ? examples/s]
dataset
Dataset({ features: ['output', 'input', 'instruction', 'text'], num_rows: 51760 })
dataset[0]
{'output': '1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.', 'input': '', 'instruction': 'Give three tips for staying healthy.', 'text': 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGive three tips for staying healthy.\n\n### Input:\n\n\n### Response:\n1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.'}
from trl import SFTTrainer from transformers import TrainingArguments trainer = SFTTrainer( model = model, # peft model train_dataset = dataset, dataset_text_field="text", max_seq_length=max_seq_length, args = TrainingArguments( per_device_train_batch_size=2, gradient_accumulation_steps=4, warmup_steps=10, max_steps=30, learning_rate=2e-4, fp16=not torch.cuda.is_bf16_supported(), bf16=torch.cuda.is_bf16_supported(), logging_steps=1, optim="adamw_8bit", weight_decay=0.01, lr_scheduler_type="linear", seed=1234, output_dir="outputs" ) )
tokenizer_config.json: 0%| | 0.00/50.6k [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/9.09M [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/464 [00:00<?, ?B/s]
Map: 0%| | 0/51760 [00:00<?, ? examples/s]
/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:318: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code. warnings.warn( max_steps is given, it will override any value given in num_train_epochs
trainer_stats = trainer.train()
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \\ /| Num examples = 51,760 | Num Epochs = 1 O^O/ \_/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 \ / Total batch size = 8 | Total steps = 30 "-____-" Number of trainable parameters = 41,943,040
trainer_stats
TrainOutput(global_step=30, training_loss=2.679690368970235, metrics={'train_runtime': 226.5245, 'train_samples_per_second': 1.059, 'train_steps_per_second': 0.132, 'total_flos': 2750561593786368.0, 'train_loss': 2.679690368970235, 'epoch': 0.00463678516228748})
## save the model model.save_pretrained("./best_model") tokenizer.save_pretrained('./best_model')
('./best_model/tokenizer_config.json', './best_model/special_tokens_map.json', './best_model/tokenizer.json')
## unsloth save model from unsloth import unsloth_save_model unsloth_save_model(model, tokenizer, "unsloth_model", )
Unsloth: Saving tokenizer... Done. Unsloth: Saving model... Done.
('unsloth_model', None)
FastLanguageModel.for_inference(model) instruction = "You are a helpful assistant who can answer questions" input = "Who developed GPT models" # process the input inputs = tokenizer([alpaca_prompt.format(instruction, input, "")], return_tensors='pt').to('cuda') outputs = model.generate(**inputs, max_new_tokens=100) response = tokenizer.batch_decode(outputs)[0] print(response)
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: You are a helpful assistant who can answer questions ### Input: Who developed GPT models ### Response: OpenAI <|end_of_text|>