AIOPS/MLOPS | Why Fine-Tuning and How to Fine-Tune a Model using OpenAI | Sagar Kakkala´s World






Why Fine-Tuning ?

let us say, You hold a Company and you were using LLM like chatgpt, Gemini, Claude,... and you decided to integrate this within your Company. As a Company, You want your chatgpt to have answers to your custom Data internally and you don`t want to make the information available to public

This is where Fine-Tuning helps, its like creating your own LLM with your own Data, This is only one example of Fine Tuning where you can use it for many other puproses, We will be using base LLM Model as reference which is already trained with lot of Data , and we will be training it with our Own Data

You can also have your own chatbot in your company website which can help public to navigate faster with the required Data they need

How can i do Fine-Tuning ?

We will learn in detail in this session, but to simply put, these would be the following steps

1. select a base LLM Model for which you want your Data to Train and costs you may incurr

2. Create a Custom Dataset with JSONL format

3. Upload this Dataset to LLM Model

4. Train your Model to use the Dataset ()


Pre-Requisties Before making a Fine-tune Model

You can also check here on how to do Fine-Tuning from OpenAI - OpenAI Fine-tuning

Step1: Get an API key for us to connect with the Model - OpenAI API key



Now click on "Create Secret Access key"


Save the Key Generated as it appears only once , this would be our API key to connect to our OpenAI model

Step2: You need to have payment method added for us to train our Model and payment will be based on Model you select - OpenAI Billing

Click on Add Payment details Before Fine tuning


And you need to add initial credit between 5 and 100 Dollars


You can track usage of this API from the following link - Track Usage



Step3: Select Model based on your Requirement - OpenAI Fine Tuning Pricing

Also make sure to select which Model suits your requirement Based on Pricing



Step4: Create a Custom Dataset ready for fine-tuning

We will create a custom Dataset and will ask AI to create JSONL document for us , which is required for fine-tuning, consider token and costs, we will be going with simple question and answers


You can use chatgpt to create JSONL Document using Chatgpt with Prompt


Can you Create JSONL Document for following Dataset for fine tuning


Q1.What is sagar kakkala favourite coulor , A1. Sagar Kakkala fav colour is yellow

Q2. What is sagar kakkala favourite food? , A2. Sagar Kakkala munches on chicken Biryani

Q3. What do sagar Kakkala do for living? ,A3. Sagar kakkala works as senior infrastructure engineer

Q4. Which city does sagar belong to?, A4.  Sagar Kakkala is from City of Destiny, Visakhapatnam

Q5. What are Sagar Kakkala Hobbies ,A5. Sagar Kakkala hobies include surfing,boxing, Gym, Jiu Jitsu

Q6. What do sagar kakkala do on weekends,  A6. Sagar Kakkala takes live session on weekends related to DevOps technologies

Q7. What is Sagar Kakkala Favourite Book ? ,  A7. Sagar Kakkala Fav Book is "Shiva Triology" written by Amish

Q8. What is Sagar Kakkala Favourite Movie,  A8. Sagar kakkala Fav Movie is "Vedham" by Krish

Q9. Which Anime is Sagar Kakkala Favorutie, A9 Sagar Kakkala Fav Anime is "One Piece"

Q10. What DC superhero does Sagar Kakkala like Most, A10. Sagar Kakkala likes to be called as "BATMAN"


You will get response something similar as below

{"prompt": "What is sagar kakkala favourite coulor?", "completion": " Sagar Kakkala fav colour is yellow"}

{"prompt": "What is sagar kakkala favourite food?", "completion": " Sagar Kakkala munches on chicken Biryani"}

{"prompt": "What do sagar Kakkala do for living?", "completion": " Sagar kakkala works as senior infrastructure engineer"}

{"prompt": "Which city does sagar belong to?", "completion": " Sagar Kakkala is from City of Destiny, Visakhapatnam"}

{"prompt": "What are Sagar Kakkala Hobbies?", "completion": " Sagar Kakkala hobies include surfing, boxing, Gym, Jiu Jitsu"}

{"prompt": "What do sagar kakkala do on weekends?", "completion": " Sagar Kakkala takes live session on weekends related to DevOps technologies"}

{"prompt": "What is Sagar Kakkala Favourite Book?", "completion": " Sagar Kakkala Fav Book is \"Shiva Triology\" written by Amish"}

{"prompt": "What is Sagar Kakkala Favourite Movie?", "completion": " Sagar kakkala Fav Movie is \"Vedham\" by Krish"}

{"prompt": "Which Anime is Sagar Kakkala Favorutie?", "completion": " Sagar Kakkala Fav Anime is \"One Piece\""}

{"prompt": "What DC superhero does Sagar Kakkala like Most?", "completion": " Sagar Kakkala likes to be called as \"BATMAN\""}


Fine-Tuning a Model

Now let us use our JSONL Dataset, login to EC2 server, You can use t2.micro for this project to save costs and if you have free-tier account , it can save you lot of costs 

sudo apt update


sudo apt install python3-pip -y




sudo apt install python3.12-venv






Now Run python Virtual Environment

python3 -m venv openai-env




Now Activate Python Venv



source openai-env/bin/activate


Upgrade Pip,openai and pandas

pip install --upgrade pip
pip install --upgrade openai
pip install pandas




Now export OpenAI API Key

export OPENAI_API_KEY="sk-proj-9******" #Place your OPEN_API_KEY




cat > sagar_kakkala_dataset.jsonl <<'EOF'
{"prompt": "What is sagar kakkala favourite coulor?", "completion": " Sagar Kakkala fav colour is yellow"}
{"prompt": "What is sagar kakkala favourite food?", "completion": " Sagar Kakkala munches on chicken Biryani"}
{"prompt": "What do sagar Kakkala do for living?", "completion": " Sagar kakkala works as senior infrastructure engineer"}
{"prompt": "Which city does sagar belong to?", "completion": " Sagar Kakkala is from City of Destiny, Visakhapatnam"}
{"prompt": "What are Sagar Kakkala Hobbies?", "completion": " Sagar Kakkala hobies include surfing, boxing, Gym, Jiu Jitsu"}
{"prompt": "What do sagar kakkala do on weekends?", "completion": " Sagar Kakkala takes live session on weekends related to DevOps technologies"}
{"prompt": "What is Sagar Kakkala Favourite Book?", "completion": " Sagar Kakkala Fav Book is \"Shiva Triology\" written by Amish"}
{"prompt": "What is Sagar Kakkala Favourite Movie?", "completion": " Sagar kakkala Fav Movie is \"Vedham\" by Krish"}
{"prompt": "Which Anime is Sagar Kakkala Favorutie?", "completion": " Sagar Kakkala Fav Anime is \"One Piece\""}
{"prompt": "What DC superhero does Sagar Kakkala like Most?", "completion": " Sagar Kakkala likes to be called as \"BATMAN\""}
EOF







openai tools fine_tunes.prepare_data -f sagar_kakkala_dataset.jsonl




The command above gives facility to remove Duplicate datas and also format Data to make it clean and structured , OpenAI tool takes care of things and creates two additional files

Once openAi structures Dataset, you will get new generated file with "prepared" and "valid" files as suffix in it

Now upload files 

openai api files.create -f sagar_kakkala_dataset_prepared_train.jsonl -p fine-tune


openai api files.create -f sagar_kakkala_dataset_prepared_valid.jsonl -p fine-tune


Now get the file-ids from file uploads as shown in screenshot

For sagar_kakkala_dataset_prepared_train.jsonl - file-TrUeEJv6Y8UnsUyjyk1eNQ

For sagar_kakkala_dataset_prepared_valid.json - file-6QVAW5NbiDRJVRXMX8MNHr


Troubleshoot as Homework

openai api fine_tuning.jobs.create -m ada -F file-TrUeEJv6Y8UnsUyjyk1eNQ -V file-6QVAW5NbiDRJVRXMX8MNHr

Note: Troubleshoot which Model could fit in using "Prompt" Dataset as higher model needs "messages" Dataset



In case, You used higher model for "Prompt" Dataset, we will not be able to use model


and also you cannot use the "messages" Dataset to generate "Preparerd" and "Valid" files

and as an alternate you can check what models does OpenAI offers and try to see which model suits the best that can use "Prompt"

openai api models.list

Tip: Try fine-tuning with different models from list and see what suits the best 


Workaround

since i have tried  few base models from OpenAI , we will use latest fine tune models , which requires "messages" to be in prompt

cat > sk_dataset.jsonl
{"messages":[{"role":"user","content":"What is sagar kakkala favourite coulor?"},{"role":"assistant","content":"Sagar Kakkala fav colour is yellow"}]}
{"messages":[{"role":"user","content":"What is sagar kakkala favourite food?"},{"role":"assistant","content":"Sagar Kakkala munches on chicken Biryani"}]}
{"messages":[{"role":"user","content":"What do sagar Kakkala do for living?"},{"role":"assistant","content":"Sagar kakkala works as senior infrastructure engineer"}]}
{"messages":[{"role":"user","content":"Which city does sagar belong to?"},{"role":"assistant","content":"Sagar Kakkala is from City of Destiny, Visakhapatnam"}]}
{"messages":[{"role":"user","content":"What are Sagar Kakkala Hobbies?"},{"role":"assistant","content":"Sagar Kakkala hobies include surfing, boxing, Gym, Jiu Jitsu"}]}
{"messages":[{"role":"user","content":"What do sagar kakkala do on weekends?"},{"role":"assistant","content":"Sagar Kakkala takes live session on weekends related to DevOps technologies"}]}
{"messages":[{"role":"user","content":"What is Sagar Kakkala Favourite Book?"},{"role":"assistant","content":"Sagar Kakkala Fav Book is \"Shiva Triology\" written by Amish"}]}
{"messages":[{"role":"user","content":"What is Sagar Kakkala Favourite Movie?"},{"role":"assistant","content":"Sagar kakkala Fav Movie is \"Vedham\" by Krish"}]}
{"messages":[{"role":"user","content":"Which Anime is Sagar Kakkala Favorutie?"},{"role":"assistant","content":"Sagar Kakkala Fav Anime is \"One Piece\""}]}
{"messages":[{"role":"user","content":"What DC superhero does Sagar Kakkala like Most?"},{"role":"assistant","content":"Sagar Kakkala likes to be called as \"BATMAN\""}]}
EOF



cp sk_dataset.jsonl sk_dataset_duplicate.jsonl

and now you can use same file twice and upload, get file ID´s and make fine tuning model using command

openai api files.create -f sk_dataset.jsonl -p fine-tune
openai api files.create -f sk_dataset_duplicate.jsonl -p fine-tune

openai api fine_tuning.jobs.create -F file-7a1CctbhFtBJyUCxG5P5Mb -V file-1iJEvbhWu8F4xtgdtjnnbU -m gpt-3.5-turbo


gpt-3.5-turbo requires dataset to have "messages" in it 

Once you have Fine-Tuned Model, OpenAI uploads training in batches, so it might takes half an hour for your Model to be fully ready

Once it is ready, You can play in playground using UI provided by OpenAI - OpenAI-Finetune-UI






You can click on "Playground" that can give you tabs of Responses from Chatgpt Model and Your Fine-Tuned Model





Therefore, we have successfully Fine-Tuned our OpenAI Model










Comments