Why Fine-Tuning ?
let us say, You hold a Company and you were using LLM like chatgpt, Gemini, Claude,... and you decided to integrate this within your Company. As a Company, You want your chatgpt to have answers to your custom Data internally and you don`t want to make the information available to public
This is where Fine-Tuning helps, its like creating your own LLM with your own Data, This is only one example of Fine Tuning where you can use it for many other puproses, We will be using base LLM Model as reference which is already trained with lot of Data , and we will be training it with our Own Data
You can also have your own chatbot in your company website which can help public to navigate faster with the required Data they need
How can i do Fine-Tuning ?
We will learn in detail in this session, but to simply put, these would be the following steps
1. select a base LLM Model for which you want your Data to Train and costs you may incurr
2. Create a Custom Dataset with JSONL format
3. Upload this Dataset to LLM Model
4. Train your Model to use the Dataset ()
Pre-Requisties Before making a Fine-tune Model
You can also check here on how to do Fine-Tuning from OpenAI - OpenAI Fine-tuning
Step1: Get an API key for us to connect with the Model - OpenAI API key
Now click on "Create Secret Access key"
Save the Key Generated as it appears only once , this would be our API key to connect to our OpenAI model
Step2: You need to have payment method added for us to train our Model and payment will be based on Model you select - OpenAI Billing
Click on Add Payment details Before Fine tuning
And you need to add initial credit between 5 and 100 Dollars
You can track usage of this API from the following link - Track Usage
Step3: Select Model based on your Requirement - OpenAI Fine Tuning Pricing
Also make sure to select which Model suits your requirement Based on Pricing
Step4: Create a Custom Dataset ready for fine-tuning
We will create a custom Dataset and will ask AI to create JSONL document for us , which is required for fine-tuning, consider token and costs, we will be going with simple question and answers
You can use chatgpt to create JSONL Document using Chatgpt with Prompt
Can you Create JSONL Document for following Dataset for fine tuning
Q1.What is sagar kakkala favourite coulor , A1. Sagar Kakkala fav colour is yellow
Q2. What is sagar kakkala favourite food? , A2. Sagar Kakkala munches on chicken Biryani
Q3. What do sagar Kakkala do for living? ,A3. Sagar kakkala works as senior infrastructure engineer
Q4. Which city does sagar belong to?, A4. Sagar Kakkala is from City of Destiny, Visakhapatnam
Q5. What are Sagar Kakkala Hobbies ,A5. Sagar Kakkala hobies include surfing,boxing, Gym, Jiu Jitsu
Q6. What do sagar kakkala do on weekends, A6. Sagar Kakkala takes live session on weekends related to DevOps technologies
Q7. What is Sagar Kakkala Favourite Book ? , A7. Sagar Kakkala Fav Book is "Shiva Triology" written by Amish
Q8. What is Sagar Kakkala Favourite Movie, A8. Sagar kakkala Fav Movie is "Vedham" by Krish
Q9. Which Anime is Sagar Kakkala Favorutie, A9 Sagar Kakkala Fav Anime is "One Piece"
Q10. What DC superhero does Sagar Kakkala like Most, A10. Sagar Kakkala likes to be called as "BATMAN"
You will get response something similar as below
{"prompt": "What is sagar kakkala favourite coulor?", "completion": " Sagar Kakkala fav colour is yellow"}
{"prompt": "What is sagar kakkala favourite food?", "completion": " Sagar Kakkala munches on chicken Biryani"}
{"prompt": "What do sagar Kakkala do for living?", "completion": " Sagar kakkala works as senior infrastructure engineer"}
{"prompt": "Which city does sagar belong to?", "completion": " Sagar Kakkala is from City of Destiny, Visakhapatnam"}
{"prompt": "What are Sagar Kakkala Hobbies?", "completion": " Sagar Kakkala hobies include surfing, boxing, Gym, Jiu Jitsu"}
{"prompt": "What do sagar kakkala do on weekends?", "completion": " Sagar Kakkala takes live session on weekends related to DevOps technologies"}
{"prompt": "What is Sagar Kakkala Favourite Book?", "completion": " Sagar Kakkala Fav Book is \"Shiva Triology\" written by Amish"}
{"prompt": "What is Sagar Kakkala Favourite Movie?", "completion": " Sagar kakkala Fav Movie is \"Vedham\" by Krish"}
{"prompt": "Which Anime is Sagar Kakkala Favorutie?", "completion": " Sagar Kakkala Fav Anime is \"One Piece\""}
{"prompt": "What DC superhero does Sagar Kakkala like Most?", "completion": " Sagar Kakkala likes to be called as \"BATMAN\""}
Fine-Tuning a Model
Now let us use our JSONL Dataset, login to EC2 server, You can use t2.micro for this project to save costs and if you have free-tier account , it can save you lot of costs
sudo apt update
sudo apt install python3-pip -y
sudo apt install python3.12-venv
python3 -m venv openai-env
source openai-env/bin/activate
pip install --upgrade pip
pip install --upgrade openai
pip install pandas
export OPENAI_API_KEY="sk-proj-9******" #Place your OPEN_API_KEY
cat > sagar_kakkala_dataset.jsonl <<'EOF'{"prompt": "What is sagar kakkala favourite coulor?", "completion": " Sagar Kakkala fav colour is yellow"}{"prompt": "What is sagar kakkala favourite food?", "completion": " Sagar Kakkala munches on chicken Biryani"}{"prompt": "What do sagar Kakkala do for living?", "completion": " Sagar kakkala works as senior infrastructure engineer"}{"prompt": "Which city does sagar belong to?", "completion": " Sagar Kakkala is from City of Destiny, Visakhapatnam"}{"prompt": "What are Sagar Kakkala Hobbies?", "completion": " Sagar Kakkala hobies include surfing, boxing, Gym, Jiu Jitsu"}{"prompt": "What do sagar kakkala do on weekends?", "completion": " Sagar Kakkala takes live session on weekends related to DevOps technologies"}{"prompt": "What is Sagar Kakkala Favourite Book?", "completion": " Sagar Kakkala Fav Book is \"Shiva Triology\" written by Amish"}{"prompt": "What is Sagar Kakkala Favourite Movie?", "completion": " Sagar kakkala Fav Movie is \"Vedham\" by Krish"}{"prompt": "Which Anime is Sagar Kakkala Favorutie?", "completion": " Sagar Kakkala Fav Anime is \"One Piece\""}{"prompt": "What DC superhero does Sagar Kakkala like Most?", "completion": " Sagar Kakkala likes to be called as \"BATMAN\""}EOF
openai tools fine_tunes.prepare_data -f sagar_kakkala_dataset.jsonl
openai api files.create -f sagar_kakkala_dataset_prepared_train.jsonl -p fine-tune
openai api files.create -f sagar_kakkala_dataset_prepared_valid.jsonl -p fine-tune
Troubleshoot as Homework
openai api fine_tuning.jobs.create -m ada -F file-TrUeEJv6Y8UnsUyjyk1eNQ -V file-6QVAW5NbiDRJVRXMX8MNHr
openai api models.list
Workaround
cp sk_dataset.jsonl sk_dataset_duplicate.jsonl
openai api files.create -f sk_dataset.jsonl -p fine-tune
openai api files.create -f sk_dataset_duplicate.jsonl -p fine-tune
openai api fine_tuning.jobs.create -F file-7a1CctbhFtBJyUCxG5P5Mb -V file-1iJEvbhWu8F4xtgdtjnnbU -m gpt-3.5-turbo
Comments
Post a Comment