Hrushikesh Modupalli

How I Fine-tuned an AI model for the first time?

-by Hrushikesh

Ok, I wanted to fine tune a model — let me learn about fine-tuning, decide which AI model to fine-tune and read about the model, collect a perfect dataset and start doing it… NOPE!!!

This often caused a delay cuz I was not deciding on a AI model instead was in a dilemma to fine-tune this perfect model which was a big mistake in hindsight. It often caused confusion and procastination. So, I decided to get my hands dirty and do it anyway. And my mindset was like I was gonna break it, but atleast learn something.

I started with choosing a random model by listing a bunch of models and picking it through a random function. Yes, I actually picked it through Math.random() * 10 in a browser console. And (drumroll) It was Salesforce/codet5p-770m which I did not know existed a couple minutes ago and knew about it just because It was on 7th place on the piece of paper I wrote. Here are the pictures…

As this particular AI model was known to be good at code generation, I thought let me make it write Python code, particularly Manim (3blue1brown mentioned) which is used for 2D image generation.

I finally was ready to do it , the magical finetuning and searched the web to find how fine-tuning code is written and I read a bunch of docs etc… Hopped around Huggingface, DigitalOcean, etc and finally did land on Google Colab since it was the one where I can do it for free using a Nvidia t4 gpu while others charge you.

As I was ready to write the code by reading docs and asking AI for help, I then remembered I needed dataset and again, here I was, surfing the web to find something and again comes ChatGPT to the rescue with two datasets one with 1000 prompts and completions and another one with around 14000. Then I wrote the fine-tune code by reading some docs and asking AI for help. And after all this hustle it was time to do the tuning.

Then came the first issue to be solved there was some bug which was only related to Huggingface and Google Colab. Me, being dumb, tried to debug it without using AI tried it for 2 hours and asking AI only to find out it was not the problem if I was using any other platform but Google Colab. After that, I did fine-tune the model with the first dataset in very short time, like 30 minutes. I downloaded the model wrote the code for a CLI and was feeling good — it worked but needed to be done with bigger dataset and.…

And here came the challenges for the big dataset of 14000 and the cell where you iterate through the dataset — it showed me 2hrs 40 mins of compute . I stared at my screen for 90 mins straight and did switch tabs to work on something else, I worked for around 30 mins and went back to check the progress and boom!! “Your runtime has been disconnected due to inactivity” :(( 2hrs of progress gone and me being me started doing it again but this time changed my system settings to never sleep when no activity and plenty of other precautions. And I actually completed fine-tuning a model with large dataset and How were the results?

Not good though! Better than previous iteration? Maybe. Disappointed? Not at all. Will I do it again? Most probably tmrw and It was never about getting it done perfectly and use it but to learn how is fine-tuning done and how this magical sort of a thing which didn’t exist less than 5 years ago works under the hood. Do I know everything about fine-tuning — Absolutely not but I very confidently can say “I fine tuned an AI model” and also can say something I heard or read : “You can just do things” .

As it is my first blog , I want to add my favorite quote -I don’t know where I heard it but it goes like— “People overestimate what they can do in a day, And Underestimate what they can do in a decade”….

GRIND AND SHINE...