Why Fine-Tuning ?

let us say, You hold a Company and you were using LLM like chatgpt, Gemini, Claude,... and you decided to integrate this within your Company. As a Company, You want your chatgpt to have answers to your custom Data internally and you don`t want to make the information available to public

This is where Fine-Tuning helps, its like creating your own LLM with your own Data, This is only one example of Fine Tuning where you can use it for many other puproses, We will be using base LLM Model as reference which is already trained with lot of Data , and we will be training it with our Own Data

You can also have your own chatbot in your company website which can help public to navigate faster with the required Data they need

How can i do Fine-Tuning ?

We will learn in detail in this session, but to simply put, these would be the following steps

1. select a base LLM Model for which you want your Data to Train and costs you may incurr

2. Create a Custom Dataset with JSONL format

3. Upload this Dataset to LLM Model

4. Train your Model to use the Dataset ()

Pre-Requisties Before making a Fine-tune Model

You can also check here on how to do Fine-Tuning from OpenAI - OpenAI Fine-tuning

Step1: Get an API key for us to connect with the Model - OpenAI API key

Now click on "Create Secret Access key"

Save the Key Generated as it appears only once , this would be our API key to connect to our OpenAI model

Step2: You need to have payment method added for us to train our Model and payment will be based on Model you select - OpenAI Billing

Click on Add Payment details Before Fine tuning

And you need to add initial credit between 5 and 100 Dollars

You can track usage of this API from the following link - Track Usage

Step3: Select Model based on your Requirement - OpenAI Fine Tuning Pricing

Also make sure to select which Model suits your requirement Based on Pricing

Step4: Create a Custom Dataset ready for fine-tuning

We will create a custom Dataset and will ask AI to create JSONL document for us , which is required for fine-tuning, consider token and costs, we will be going with simple question and answers

You can use chatgpt to create JSONL Document using Chatgpt with Prompt

Can you Create JSONL Document for following Dataset for fine tuning

Q1.What is sagar kakkala favourite coulor , A1. Sagar Kakkala fav colour is yellow

Q2. What is sagar kakkala favourite food? , A2. Sagar Kakkala munches on chicken Biryani

Q3. What do sagar Kakkala do for living? ,A3. Sagar kakkala works as senior infrastructure engineer

Q4. Which city does sagar belong to?, A4. Sagar Kakkala is from City of Destiny, Visakhapatnam

Q5. What are Sagar Kakkala Hobbies ,A5. Sagar Kakkala hobies include surfing,boxing, Gym, Jiu Jitsu

Q6. What do sagar kakkala do on weekends, A6. Sagar Kakkala takes live session on weekends related to DevOps technologies

Q7. What is Sagar Kakkala Favourite Book ? , A7. Sagar Kakkala Fav Book is "Shiva Triology" written by Amish

Q8. What is Sagar Kakkala Favourite Movie, A8. Sagar kakkala Fav Movie is "Vedham" by Krish

Q9. Which Anime is Sagar Kakkala Favorutie, A9 Sagar Kakkala Fav Anime is "One Piece"

Q10. What DC superhero does Sagar Kakkala like Most, A10. Sagar Kakkala likes to be called as "BATMAN"

You will get response something similar as below

{"prompt": "What is sagar kakkala favourite coulor?", "completion": " Sagar Kakkala fav colour is yellow"}

{"prompt": "What is sagar kakkala favourite food?", "completion": " Sagar Kakkala munches on chicken Biryani"}

{"prompt": "What do sagar Kakkala do for living?", "completion": " Sagar kakkala works as senior infrastructure engineer"}

{"prompt": "Which city does sagar belong to?", "completion": " Sagar Kakkala is from City of Destiny, Visakhapatnam"}

{"prompt": "What are Sagar Kakkala Hobbies?", "completion": " Sagar Kakkala hobies include surfing, boxing, Gym, Jiu Jitsu"}

{"prompt": "What do sagar kakkala do on weekends?", "completion": " Sagar Kakkala takes live session on weekends related to DevOps technologies"}

{"prompt": "What is Sagar Kakkala Favourite Book?", "completion": " Sagar Kakkala Fav Book is \"Shiva Triology\" written by Amish"}

{"prompt": "What is Sagar Kakkala Favourite Movie?", "completion": " Sagar kakkala Fav Movie is \"Vedham\" by Krish"}

{"prompt": "Which Anime is Sagar Kakkala Favorutie?", "completion": " Sagar Kakkala Fav Anime is \"One Piece\""}

{"prompt": "What DC superhero does Sagar Kakkala like Most?", "completion": " Sagar Kakkala likes to be called as \"BATMAN\""}

Fine-Tuning a Model

Now let us use our JSONL Dataset, login to EC2 server, You can use t2.micro for this project to save costs and if you have free-tier account , it can save you lot of costs

sudo apt update

sudo apt install python3-pip -y

sudo apt install python3.12-venv

Now Run python Virtual Environment

python3 -m venv openai-env

Now Activate Python Venv

source openai-env/bin/activate

Upgrade Pip,openai and pandas

pip install --upgrade pip
pip install --upgrade openai
pip install pandas

Now export OpenAI API Key

export OPENAI_API_KEY="sk-proj-9******" #Place your OPEN_API_KEY

cat > sagar_kakkala_dataset.jsonl <<'EOF'
{"prompt": "What is sagar kakkala favourite coulor?", "completion": " Sagar Kakkala fav colour is yellow"}
{"prompt": "What is sagar kakkala favourite food?", "completion": " Sagar Kakkala munches on chicken Biryani"}
{"prompt": "What do sagar Kakkala do for living?", "completion": " Sagar kakkala works as senior infrastructure engineer"}
{"prompt": "Which city does sagar belong to?", "completion": " Sagar Kakkala is from City of Destiny, Visakhapatnam"}
{"prompt": "What are Sagar Kakkala Hobbies?", "completion": " Sagar Kakkala hobies include surfing, boxing, Gym, Jiu Jitsu"}
{"prompt": "What do sagar kakkala do on weekends?", "completion": " Sagar Kakkala takes live session on weekends related to DevOps technologies"}
{"prompt": "What is Sagar Kakkala Favourite Book?", "completion": " Sagar Kakkala Fav Book is \"Shiva Triology\" written by Amish"}
{"prompt": "What is Sagar Kakkala Favourite Movie?", "completion": " Sagar kakkala Fav Movie is \"Vedham\" by Krish"}
{"prompt": "Which Anime is Sagar Kakkala Favorutie?", "completion": " Sagar Kakkala Fav Anime is \"One Piece\""}
{"prompt": "What DC superhero does Sagar Kakkala like Most?", "completion": " Sagar Kakkala likes to be called as \"BATMAN\""}
EOF

openai tools fine_tunes.prepare_data -f sagar_kakkala_dataset.jsonl

The command above gives facility to remove Duplicate datas and also format Data to make it clean and structured , OpenAI tool takes care of things and creates two additional files

Once openAi structures Dataset, you will get new generated file with "prepared" and "valid" files as suffix in it

Now upload files

openai api files.create -f sagar_kakkala_dataset_prepared_train.jsonl -p fine-tune

openai api files.create -f sagar_kakkala_dataset_prepared_valid.jsonl -p fine-tune

Now get the file-ids from file uploads as shown in screenshot

For sagar_kakkala_dataset_prepared_train.jsonl - file-TrUeEJv6Y8UnsUyjyk1eNQ

For sagar_kakkala_dataset_prepared_valid.json - file-6QVAW5NbiDRJVRXMX8MNHr

Troubleshoot as Homework

openai api fine_tuning.jobs.create -m ada -F file-TrUeEJv6Y8UnsUyjyk1eNQ -V file-6QVAW5NbiDRJVRXMX8MNHr

Note: Troubleshoot which Model could fit in using "Prompt" Dataset as higher model needs "messages" Dataset

In case, You used higher model for "Prompt" Dataset, we will not be able to use model

and also you cannot use the "messages" Dataset to generate "Preparerd" and "Valid" files

and as an alternate you can check what models does OpenAI offers and try to see which model suits the best that can use "Prompt"

openai api models.list

Tip: Try fine-tuning with different models from list and see what suits the best

Workaround

since i have tried few base models from OpenAI , we will use latest fine tune models , which requires "messages" to be in prompt

cat > sk_dataset.jsonl

{"messages":[{"role":"user","content":"What is sagar kakkala favourite coulor?"},{"role":"assistant","content":"Sagar Kakkala fav colour is yellow"}]}

{"messages":[{"role":"user","content":"What is sagar kakkala favourite food?"},{"role":"assistant","content":"Sagar Kakkala munches on chicken Biryani"}]}

{"messages":[{"role":"user","content":"What do sagar Kakkala do for living?"},{"role":"assistant","content":"Sagar kakkala works as senior infrastructure engineer"}]}

{"messages":[{"role":"user","content":"Which city does sagar belong to?"},{"role":"assistant","content":"Sagar Kakkala is from City of Destiny, Visakhapatnam"}]}

{"messages":[{"role":"user","content":"What are Sagar Kakkala Hobbies?"},{"role":"assistant","content":"Sagar Kakkala hobies include surfing, boxing, Gym, Jiu Jitsu"}]}

{"messages":[{"role":"user","content":"What do sagar kakkala do on weekends?"},{"role":"assistant","content":"Sagar Kakkala takes live session on weekends related to DevOps technologies"}]}

{"messages":[{"role":"user","content":"What is Sagar Kakkala Favourite Book?"},{"role":"assistant","content":"Sagar Kakkala Fav Book is \"Shiva Triology\" written by Amish"}]}

{"messages":[{"role":"user","content":"What is Sagar Kakkala Favourite Movie?"},{"role":"assistant","content":"Sagar kakkala Fav Movie is \"Vedham\" by Krish"}]}

{"messages":[{"role":"user","content":"Which Anime is Sagar Kakkala Favorutie?"},{"role":"assistant","content":"Sagar Kakkala Fav Anime is \"One Piece\""}]}

{"messages":[{"role":"user","content":"What DC superhero does Sagar Kakkala like Most?"},{"role":"assistant","content":"Sagar Kakkala likes to be called as \"BATMAN\""}]}

EOF

cp sk_dataset.jsonl sk_dataset_duplicate.jsonl

and now you can use same file twice and upload, get file ID´s and make fine tuning model using command

openai api files.create -f sk_dataset.jsonl -p fine-tune

openai api files.create -f sk_dataset_duplicate.jsonl -p fine-tune

openai api fine_tuning.jobs.create -F file-7a1CctbhFtBJyUCxG5P5Mb -V file-1iJEvbhWu8F4xtgdtjnnbU -m gpt-3.5-turbo

gpt-3.5-turbo requires dataset to have "messages" in it

Once you have Fine-Tuned Model, OpenAI uploads training in batches, so it might takes half an hour for your Model to be fully ready

Once it is ready, You can play in playground using UI provided by OpenAI - OpenAI-Finetune-UI

You can click on "Playground" that can give you tabs of Responses from Chatgpt Model and Your Fine-Tuned Model

Therefore, we have successfully Fine-Tuned our OpenAI Model

Sagar Kakkala's World

Search This Blog

AIOPS/MLOPS | Why Fine-Tuning and How to Fine-Tune a Model using OpenAI | Sagar Kakkala´s World