CoCalc -- LLM Usage and Fine Tuning Llama3

GitHub Repository: aswintechguy/Deep-Learning-Projects
Path: blob/main/LLM Usage and Fine Tuning Llama3 - Unsloth/LLM Usage and Fine Tuning Llama3 - Unsloth.ipynb
⁵⁶⁹ views

Kernel: Python 3

Install Dependencies

In [1]:

!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" -q
!pip install --no-deps "trl<0.9.0" peft accelerate bitsandbytes xformers datasets -q

Out[1]:

  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 103.4/103.4 kB 3.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.3/9.3 MB 34.5 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 547.8/547.8 kB 44.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 52.2 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 MB 15.1 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 18.4 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB 9.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 23.1 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 19.7 MB/s eta 0:00:00
  Building wheel for unsloth (pyproject.toml) ... done
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf-cu12 24.4.1 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 16.1.0 which is incompatible.
google-colab 1.0.0 requires requests==2.31.0, but you have requests 2.32.3 which is incompatible.
ibis-framework 8.0.0 requires pyarrow<16,>=2, but you have pyarrow 16.1.0 which is incompatible.
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 245.2/245.2 kB 5.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 251.6/251.6 kB 10.5 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 314.1/314.1 kB 9.7 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 119.8/119.8 MB 9.1 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 222.7/222.7 MB 5.3 MB/s eta 0:00:00

LLM Inference

In [3]:

from unsloth import FastLanguageModel
import torch

max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit
)

FastLanguageModel.for_inference(model)

Out[3]:

config.json:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

==((====))==  Unsloth: Fast Llama patching release 2024.7
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.26.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth

model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/464 [00:00<?, ?B/s]

In [6]:

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

In [5]:

instruction = "You are a helpful assistant who can answer questions"
input = "Who developed GPT models"

# process the input
inputs = tokenizer([alpaca_prompt.format(instruction, input, "")], return_tensors='pt').to('cuda')
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.batch_decode(outputs)[0]
print(response)

Out[5]:

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.

<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are a helpful assistant who can answer questions

### Input:
Who developed GPT models

### Response:
OpenAI developed GPT models.<|end_of_text|>

In [15]:

instruction = "You are a helpful assistant who can answer questions"
input = "Explain about Transformers in AI?"

# process the input
inputs = tokenizer([alpaca_prompt.format(instruction, input, "")], return_tensors='pt').to('cuda')
outputs = model.generate(**inputs, max_new_tokens=100, temperature = 0.1)
response = tokenizer.batch_decode(outputs)[0]
print(response)

Out[15]:

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.

<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are a helpful assistant who can answer questions

### Input:
Explain about Transformers in AI?

### Response:
Transformers are a type of artificial intelligence (AI) that uses a neural network to learn patterns in data. They are used in a variety of applications, including natural language processing, computer vision, and speech recognition. Transformers are able to learn complex patterns in data by using a neural network to process the data in a way that is similar to how the human brain processes information. This allows them to learn patterns in data that are too complex for traditional machine learning algorithms to handle.<|end_of_text|>

In [ ]:

Fine Tuning

In [17]:

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    lora_alpha = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = True,
    random_state = 42,
    max_seq_length = max_seq_length
)

Out[17]:

Unsloth 2024.7 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.

In [18]:

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

In [19]:

def format_input_prompt(examples):
    # get the list with keys
    instructions = examples['instruction']
    inputs = examples['input']
    outputs = examples['output']

    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # format the input prompt
        text = alpaca_prompt.format(instruction, input, output)
        texts.append(text)

    return {"text": texts}

In [21]:

# import the dataset
from datasets import load_dataset

dataset = load_dataset("yahma/alpaca-cleaned", split='train')

dataset = dataset.map(format_input_prompt, batched=True)

Out[21]:

Downloading readme:   0%|          | 0.00/11.6k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/44.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/51760 [00:00<?, ? examples/s]

Map:   0%|          | 0/51760 [00:00<?, ? examples/s]

In [22]:

dataset

Out[22]:

Dataset({
    features: ['output', 'input', 'instruction', 'text'],
    num_rows: 51760
})

In [23]:

dataset[0]

Out[23]:

{'output': '1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.',
 'input': '',
 'instruction': 'Give three tips for staying healthy.',
 'text': 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGive three tips for staying healthy.\n\n### Input:\n\n\n### Response:\n1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.'}

In [24]:

from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model, # peft model
    train_dataset = dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    args = TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=10,
        max_steps=30,
        learning_rate=2e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=1234,
        output_dir="outputs"
    )
)

Out[24]:

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/464 [00:00<?, ?B/s]

Map:   0%|          | 0/51760 [00:00<?, ? examples/s]

/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:318: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
  warnings.warn(
max_steps is given, it will override any value given in num_train_epochs

In [25]:

trainer_stats = trainer.train()

Out[25]:

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 51,760 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 30
 "-____-"     Number of trainable parameters = 41,943,040

In [26]:

trainer_stats

Out[26]:

TrainOutput(global_step=30, training_loss=2.679690368970235, metrics={'train_runtime': 226.5245, 'train_samples_per_second': 1.059, 'train_steps_per_second': 0.132, 'total_flos': 2750561593786368.0, 'train_loss': 2.679690368970235, 'epoch': 0.00463678516228748})

In [27]:

## save the model
model.save_pretrained("./best_model")
tokenizer.save_pretrained('./best_model')

Out[27]:

('./best_model/tokenizer_config.json',
 './best_model/special_tokens_map.json',
 './best_model/tokenizer.json')

In [28]:

## unsloth save model
from unsloth import unsloth_save_model

unsloth_save_model(model, tokenizer, "unsloth_model", )

Out[28]:

Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... Done.

('unsloth_model', None)

In [29]:

FastLanguageModel.for_inference(model)

instruction = "You are a helpful assistant who can answer questions"
input = "Who developed GPT models"

# process the input
inputs = tokenizer([alpaca_prompt.format(instruction, input, "")], return_tensors='pt').to('cuda')
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.batch_decode(outputs)[0]
print(response)

Out[29]:

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.

<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are a helpful assistant who can answer questions

### Input:
Who developed GPT models

### Response:
OpenAI
<|end_of_text|>

In [ ]:

Install Dependencies

LLM Inference

Fine Tuning

Product

Resources

Company