Finetune Mistral 7B using NVIDIA NeMo and PEFT¶
Welcome!
In this notebook, we will use NVIDIA's NeMo Framework to finetune the Mistral 7B LLM. Finetuning the process of adjusting the weights of a pre-trained foundation model with custom data. Considering that foundation models can be significantly large, a variant of fine-tuning has gained traction recently, known as parameter-efficient fine-tuning (PEFT). PEFT encompasses several methods, including P-Tuning, LoRA, Adapters, and IA3.
For those interested in a deeper understanding of these methods, we have included a list of additional resources at the end of this document.
A note about running Jupyter Notebooks: Press Shift + Enter to run a cell. A * in the left-hand cell box means the cell is running. A number means it has completed. If your Notebook is acting weird, you can interrupt a too-long process by interrupting the kernel (Kernel tab -> Interrupt Kernel) or even restarting the kernel (Kernel tab -> Restart Kernel). Note restarting the kernel will require you to run everything from the beginning.
Deploy¶
Click the badge above to quickly deploy this notebook inside of the NeMo container with no setup powered by Brev! Brev makes it easy to provision GPUs and deploy resources with a single click!
NeMo Tools and Resources:¶
NVIDIA/NeMo-Megatron-Launcher: NeMo Megatron launcher and tools (github.com)
NeMo/examples/nlp/language_modeling/tuning at main · NVIDIA/NeMo · GitHub
Requirements:¶
Software:¶
- NeMo Framework Container, version 23.05 or later
- Docker
- NVIDIA AI Enterprise Product Support Matrix
Hardware:¶
- 1X A100 GPU, preferably 80GB
Prepare the base model¶
If you already have a .nemo file in your directory for the mistral models, you can skip this step.
Otherwise, run the following cells to download the model and convert it to NeMo format
!pip install ipywidgets
!jupyter nbextension enable --py widgetsnbextension
!mkdir -p models/mistral7b
import huggingface_hub
huggingface_hub.snapshot_download(repo_id="mistralai/Mistral-7B-v0.1", local_dir="models/mistral7b", local_dir_use_symlinks=False)
!python /opt/NeMo/scripts/nlp_language_modeling/convert_hf_mistral_7b_to_nemo.py --in-file=models/mistral7b --out-file=models/mistral7b.nemo
Prepare Data¶
Next, we'll need to prepare the data that we're going to use for our LoRA fine tuning. Here we're going to be using the PubMedQA dataset, and we'll be training our model to respond with simple "yes" or "no" answers.
First let's download the data and divide it into train/validation/test splits
!git clone https://github.com/pubmedqa/pubmedqa.git
!cd pubmedqa/preprocess && python split_dataset.py pqal
Now we can convert the PubMedQA data into the JSONL format that NeMo needs for Parameter Efficient Fine Tuning. We'll also reformat the data into prompts that our model can appropriately handle.
import json
def write_jsonl(fname, json_objs):
with open(fname, 'wt') as f:
for o in json_objs:
f.write(json.dumps(o)+"\n")
def form_question(obj):
st = ""
st += f"QUESTION:{obj['QUESTION']}\n"
st += "CONTEXT: "
for i, label in enumerate(obj['LABELS']):
st += f"{obj['CONTEXTS'][i]}\n"
st += f"TARGET: the answer to the question given the context is (yes|no|maybe): "
return st
def convert_to_jsonl(data_path, output_path):
data = json.load(open(data_path, 'rt'))
json_objs = []
for k in data.keys():
obj = data[k]
prompt = form_question(obj)
completion = obj['reasoning_required_pred']
json_objs.append({"input": prompt, "output": completion})
write_jsonl(output_path, json_objs)
return json_objs
test_json_objs = convert_to_jsonl("pubmedqa/data/test_set.json", "pubmedqa_test.jsonl")
train_json_objs = convert_to_jsonl("pubmedqa/data/pqal_fold0/train_set.json", "pubmedqa_train.jsonl")
dev_json_objs = convert_to_jsonl("pubmedqa/data/pqal_fold0/dev_set.json", "pubmedqa_val.jsonl")
Here's an example of what the data looks like
test_json_objs[0]
Run Training¶
NeMo Framework uses config objects to control many of its operations, which allows you to quickly see what options you can change and carry out different experiments. We can start by downloading an example config file from github.
!wget https://raw.githubusercontent.com/NVIDIA/NeMo/main/examples/nlp/language_modeling/tuning/conf/megatron_gpt_finetuning_config.yaml
Now we'll read in this default config file with Hydra, and apply an override that enables the use of Megatron core.
import hydra
from omegaconf.omegaconf import OmegaConf
hydra.initialize(version_base=None, config_path=".")
cfg = hydra.compose(config_name="megatron_gpt_finetuning_config", overrides=['++model.mcore_gpt=True'])
To see all of the different configuration options available, you can take a look at the file we downloaded. For this example, we're going to update a couple of settings to point to our datasets and run LoRA tuning on our A100. Feel free to experiment with these different options!
For data our data configuration, we'll point to the JSONL files we wrote out earlier. concat_sampling_probabilities
determines what percentage of the finetuning data you would like to come from each file -- in our example we only have 1 train file so we choose [1.0]
OmegaConf.update(cfg, "model.data", {
"train_ds": {
"num_workers": 0,
"file_names": ["pubmedqa_train.jsonl"],
"concat_sampling_probabilities": [1.0]
},
"validation_ds": {
"num_workers": 0,
"file_names": ["pubmedqa_val.jsonl"]
},
"test_ds": {
"file_names": ["pubmedqa_test.jsonl"],
"names": ["pubmedqa"]
}
}, merge=True)
For our model settings, we don't have much to change since we're reading in a pretrained model. We need to point to our existing/converted .nemo
file, specify that we want to use LoRA as our scheme for finetuning, and choose our parallelism and batch size values. The values below should be appropriate for a single A100 GPU.
OmegaConf.update(cfg, "model", {
"restore_from_path": "models/mistral7b.nemo",
"peft": {
"peft_scheme": "lora"
},
"tensor_model_parallel_size": 1,
"pipeline_model_parallel_size": 1,
"micro_batch_size": 1,
"global_batch_size": 8,
}, merge=True)
Finally, we set some training specific options. We're training on 1 GPU on a single node at bfloat16 precision. For this example we'll also only train for 50 steps.
OmegaConf.update(cfg, "trainer", {
'devices': 1,
'num_nodes': 1,
'precision': "bf16-mixed",
"val_check_interval": 10,
"max_steps": 20
})
With our configurations set, we are ready to initialize our Trainer
object to handle our training loop, and an experiment manager to handle checkpointing and logging. After initializing the Trainer object we can load our model from disk into memory.
from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTSFTModel
from nemo.collections.nlp.parts.megatron_trainer_builder import MegatronLMPPTrainerBuilder
from nemo.collections.nlp.parts.peft_config import LoraPEFTConfig
from nemo.utils.exp_manager import exp_manager
trainer = MegatronLMPPTrainerBuilder(cfg).create_trainer()
exp_manager(trainer, cfg.exp_manager)
model_cfg = MegatronGPTSFTModel.merge_cfg_with(cfg.model.restore_from_path, cfg)
model = MegatronGPTSFTModel.restore_from(cfg.model.restore_from_path, model_cfg, trainer=trainer)
Before training our adapter, let's see how the base model performs on the dataset
trainer.test(model)
Now, let's add the LoRA Adapter and train it:
model.add_adapter(LoraPEFTConfig(model_cfg))
trainer.fit(model)
Finally, we can see how the newly finetuned model performs on the test data:
trainer.test(model)