Introduction 

In current methods of In-context learning, the large language model will be used to perform inference by providing input prompts with relevant context and generating required completions. This technique works fine for larger models with billions of parameters like GPT4 but is unsuitable for performing efficient inference from smaller models like GPT2. One-shot learning and few-shot learning techniques take up the additional context window of the model, thereby reducing the amount of input information provided for generation required completion. To increase the quality of generated completion, one must perform instruction-based fine-tuning.

3 blue and red diamond illustration
Photo by Michael Dziedzic on Unsplash

Instruction-based Finetuning of LLMs

Finetuning involves training the pre-trained LLMs with instructions on generating completions for a provided input prompt. We train the model on a set of prompt-completion pairs of text datasets to achieve this. Finetuning is a supervised learning algorithm that uses a pair of input prompts and desired completions as inputs and output, respectively, for model training. Finetuning is usually performed as an additional step on pre-trained models. 

Pretraining is a step performed before finetuning, and it is a type of self-supervised learning technique usually done for domain adaption and learning the patterns of the underlying language. 

Pretrained models are not task-specific and cannot directly perform downstream tasks like question answering or sentiment analysis. During pretraining, the primary focus is on training models to comprehend and analyze the language's patterns and linguistic characteristics rather than focusing on specific tasks. Further finetuing on this pre-trained model with an instruction dataset, i.e., a dataset with defined tasks with input prompts and desired completions, will enable the model to learn the patterns and knowledge required to perform the task. 

Currently, available models like GPT4, Falcon or LLama are finetuned on curated instruction datasets and can perform tasks like question answering, text summarization and code generation with suitable prompts provided. Let's explore the overview of the instruction-based finetuning process of LLMs. 

Overview of LLM finetuning process

Performing instruction-based finetuning of LLMs can be achieved by following these steps.:

The instruction dataset consists of a set of input prompts along with desired completions in a specific prompt format. An instruction dataset is usually created to support a variety of tasks by utilizing existing/available prompt templates for different tasks. Prompt templates are combined with data sources to create an instruction dataset. To train the model to generate coherent text for given instructions accurately, the pre-trained model undergoes finetuning. This is achieved through backpropagation, which allows the model to learn from its mistakes and improve its performance.

The model will generate required and relevant completion when unseen input is fed into the model with the prompt template used for training. This process of using the same prompt template for designing new prompts during inference for performing text generation is called prompt engineering during inference.  Following is an example of Stanford alpaca prompt format created for fine-tuning a variation of the Llama Model.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:

After preparing the instruction dataset, it should be split into training and validation since Finetuning is an algorithm based on supervised learning.

Then, the pre-trained model is further trained with the prompt-based dataset using the backpropagation learning algorithm. This process of finetuning the trained model on downstream applications is called Transfer Learning. This will enable the model to utilize the knowledge and patterns learnt from pretraining and quickly understand the structure and context of the specific tasks from the prompt-completion pairs from the prompt dataset. 

Different types of Finetuning and when it is done?

Fine-tuning can be performed in different ways depending upon the amount of input samples available, the nature of the task being fine-tuned and the requirements needed. Based on the constraints, the Finetuning process can be performed in the following ways:

Finetuning on Single-task

Finetuning on a single task can be done by training the model on a prompt-based dataset that consists of prompt-completion pairs to perform a single task. For this technique, only 500 to 1000 prompt-completion pairs of a single task are needed for fine-tuning. This is a beneficial technique when the available dataset is minimal. This technique is usually applied in the use cases of training domain-specific models like aerospace, healthcare, etc, where the curated domain-specific dataset is not abundantly available. 

However, fine-tuning the pre-trained model on a single task only might lead to training-related issues such as overfitting and catastrophic forgetting. When the available prompt dataset only consists of inputs with a single type of task, the model might overfit and not learn any significant patterns from the dataset. On the other hand, the model might forget the knowledge, understanding and patterns learnt during the pretraining process. This phenomenon is called Catastrophic Forgetting

Catastrophic forgetting might not be an issue if the intended outcome of the fine-tuning is to perform a single task (or doesn’t perform multi-tasking) and the past understanding of the knowledge is not required. But in most cases, it is a drawback, and it can be prevented by doing the following.

Applications of the fine-tuning technique are primarily domain-based fine-tuning for a particular task only. 

Multi-Task Instruction Fineutning.

Multi-task instruction fine-tuning consists of fine-tuning a pre-trained model with a prompt dataset with multiple tasks simultaneously, like text summarization, machine translation, question answering, etc. This enables the model to learn complex patterns from the dataset and prevent overfitting. This is one of the vital methods used to mitigate the issue of catastrophic forgetting while finetuning large language models. Provided these require lots and lots of curated prompt datasets. 

Some popular multi-billion parameter models, like GPT4, ChatGPT, LLama, PaLm and Falcon, are trained using the Multi-Task instruction-based finetuning technique with sufficient data samples for each task in the prompt dataset.

Conclusion

Finetuning is a process of training pre-trained models on the prompt-based dataset, resulting in models with zero-shot capabilities for the tasks being trained for. This technique will enable fine-tuned models to perform required tasks by generating relevant completions for the provided input prompt, resulting in higher accuracy and more coherent text. Therefore, it is a vital step in training LLMs and using them to perform required tasks. 

Summary

To summarise, 

Thank you for reading!


Thanks for reading NeuraForge: AI Unleashed! Please subscribe to the newsletter for a deeper and more nuanced understanding of advanced AI concepts! 🚀🤖