Introduction

In current methods of In-context learning, the large language model will be used to perform inference by providing input prompts with relevant context and generating required completions. This technique works fine for larger models with billions of parameters like GPT4 but is unsuitable for performing efficient inference from smaller models like GPT2. One-shot learning and few-shot learning techniques take up the additional context window of the model, thereby reducing the amount of input information provided for generation required completion. To increase the quality of generated completion, one must perform instruction-based fine-tuning.

3 blue and red diamond illustration — Photo by Michael Dziedzic on Unsplash

Instruction-based Finetuning of LLMs

Finetuning involves training the pre-trained LLMs with instructions on generating completions for a provided input prompt. We train the model on a set of prompt-completion pairs of text datasets to achieve this. Finetuning is a supervised learning algorithm that uses a pair of input prompts and desired completions as inputs and output, respectively, for model training. Finetuning is usually performed as an additional step on pre-trained models.

Pretraining is a step performed before finetuning, and it is a type of self-supervised learning technique usually done for domain adaption and learning the patterns of the underlying language.

Pretrained models are not task-specific and cannot directly perform downstream tasks like question answering or sentiment analysis. During pretraining, the primary focus is on training models to comprehend and analyze the language's patterns and linguistic characteristics rather than focusing on specific tasks. Further finetuing on this pre-trained model with an instruction dataset, i.e., a dataset with defined tasks with input prompts and desired completions, will enable the model to learn the patterns and knowledge required to perform the task.

Currently, available models like GPT4, Falcon or LLama are finetuned on curated instruction datasets and can perform tasks like question answering, text summarization and code generation with suitable prompts provided. Let's explore the overview of the instruction-based finetuning process of LLMs.

Overview of LLM finetuning process

Performing instruction-based finetuning of LLMs can be achieved by following these steps.:

Preparing Instruction Dataset

The instruction dataset consists of a set of input prompts along with desired completions in a specific prompt format. An instruction dataset is usually created to support a variety of tasks by utilizing existing/available prompt templates for different tasks. Prompt templates are combined with data sources to create an instruction dataset. To train the model to generate coherent text for given instructions accurately, the pre-trained model undergoes finetuning. This is achieved through backpropagation, which allows the model to learn from its mistakes and improve its performance.

The model will generate required and relevant completion when unseen input is fed into the model with the prompt template used for training. This process of using the same prompt template for designing new prompts during inference for performing text generation is called prompt engineering during inference. Following is an example of Stanford alpaca prompt format created for fine-tuning a variation of the Llama Model.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:

Split the Instruction Dataset

After preparing the instruction dataset, it should be split into training and validation since Finetuning is an algorithm based on supervised learning.

Perform Fine-tuning

Then, the pre-trained model is further trained with the prompt-based dataset using the backpropagation learning algorithm. This process of finetuning the trained model on downstream applications is called Transfer Learning. This will enable the model to utilize the knowledge and patterns learnt from pretraining and quickly understand the structure and context of the specific tasks from the prompt-completion pairs from the prompt dataset.

Different types of Finetuning and when it is done?

Fine-tuning can be performed in different ways depending upon the amount of input samples available, the nature of the task being fine-tuned and the requirements needed. Based on the constraints, the Finetuning process can be performed in the following ways:

Finetuning on Single-task

Finetuning on a single task can be done by training the model on a prompt-based dataset that consists of prompt-completion pairs to perform a single task. For this technique, only 500 to 1000 prompt-completion pairs of a single task are needed for fine-tuning. This is a beneficial technique when the available dataset is minimal. This technique is usually applied in the use cases of training domain-specific models like aerospace, healthcare, etc, where the curated domain-specific dataset is not abundantly available.

However, fine-tuning the pre-trained model on a single task only might lead to training-related issues such as overfitting and catastrophic forgetting. When the available prompt dataset only consists of inputs with a single type of task, the model might overfit and not learn any significant patterns from the dataset. On the other hand, the model might forget the knowledge, understanding and patterns learnt during the pretraining process. This phenomenon is called Catastrophic Forgetting.

Catastrophic forgetting might not be an issue if the intended outcome of the fine-tuning is to perform a single task (or doesn’t perform multi-tasking) and the past understanding of the knowledge is not required. But in most cases, it is a drawback, and it can be prevented by doing the following.

Finetune pre-trained models on multiple tasks simultaneously.
Creating an instruction-based dataset with more than 50k-100k samples of prompt-completion pairs.
Considering Parameter Efficient Finetuning (PEFT) techniques for fine-tuning pre-trained models.

Applications of the fine-tuning technique are primarily domain-based fine-tuning for a particular task only.

Multi-Task Instruction Fineutning.

Multi-task instruction fine-tuning consists of fine-tuning a pre-trained model with a prompt dataset with multiple tasks simultaneously, like text summarization, machine translation, question answering, etc. This enables the model to learn complex patterns from the dataset and prevent overfitting. This is one of the vital methods used to mitigate the issue of catastrophic forgetting while finetuning large language models. Provided these require lots and lots of curated prompt datasets.

Some popular multi-billion parameter models, like GPT4, ChatGPT, LLama, PaLm and Falcon, are trained using the Multi-Task instruction-based finetuning technique with sufficient data samples for each task in the prompt dataset.

Conclusion

Finetuning is a process of training pre-trained models on the prompt-based dataset, resulting in models with zero-shot capabilities for the tasks being trained for. This technique will enable fine-tuned models to perform required tasks by generating relevant completions for the provided input prompt, resulting in higher accuracy and more coherent text. Therefore, it is a vital step in training LLMs and using them to perform required tasks.

Summary

To summarise,

In-context learning techniques like zero-shot and few-shot are inefficient regarding context windows and generating input-relevant completions.
Pretrained models alone cannot generate coherent text, and additional training is required.
Therefore, Instruction-based fine-tuning is performed on pre-trained models to improve the quality of generated text further.
The first step involves creating a prompt-based dataset by utilizing available/defining required prompt templates.
Supervised training of pre-trained models with the prompt dataset is performed using this backpropagation technique.
Instruction-based fine-tuning can be performed for single-task and multiple-task prompt datasets per requirements.

Thank you for reading!