Mental health issues are often misunderstood, leading to fear, discomfort, and negative perceptions, exacerbated by media stereotypes. Overcoming this stigma requires education, awareness, empathy, and challenging misconceptions while ensuring accessible care. Mental health is vital for overall well-being, quality of life, and effective daily functioning, and untreated issues can worsen physical health, linking mental and physical health closely.
The dataset contrains 172 question-answer conversation between a patient and a healthcare provider. The dataset can be found here:
The notebook used for finetuning can be accesed at:
I have used sharded Falcon-7B pre-trained model and finetuned it using bitsandbytes and QLoRA utilising quantisation and PEFT techniques. Using a shared model with Accelerate allows efficient fine-tuning of large models by dynamically managing memory across CPU and GPU, enabling the process in smaller memory environments.
The original model can be found here:
I chose to perform the fine-tuning on Kaggle because it does not have single-session limits, unlike Colab, which provides about three hours of GPU access per session. Kaggle gives you 30 hours of GPU compute per week and provides around 15GB of memory with two T4 GPUs. So you can train models for multiple hours even with a free account. The entire fine-tuning process took just under five hours.
The following bitsandbytes
quantization config was used during training:
- quant_method: bitsandbytes
- _load_in_8bit: False
- _load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: bfloat16
- bnb_4bit_quant_storage: uint8
- load_in_4bit: True
- load_in_8bit: False
The following qlora
quantization config was used during training:
- quant_method: qlora
- lora_alpha: 32
- lora_dropout: 0.05
- lora_rank: 32
- bias: "none"
- task_type: "CAUSAL_LM"
- target_modules: [
"query_key_value",
"dense",
"dense_h_to_4h",
"dense_4h_to_h",
]
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 8
- eval_batch_size: 8
- training_steps: 100
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- max_seq_length: 256
PEFT fine-tuned model has been updated here:
The model takes less than 3 minutes to generate the response. I have compared the PEFT model response with the original model response in falcon7b-finetune-mental-health.ipynb
notebook. It is clear that the Original model seems to halucinate and generate vague responses, whereas the PEFT model generates more coherent and relevant responses to the questions.
I have demonstarted the chatbot capabilities of the models in gradio-chatbot-app.ipynb
notebook using Gradio as frontend for demo. You can run the notebook to get a chatbot like interface which allows you to input a query and get the response from the model. You can play around with different hyperparameter settings and check the quality of the generated response.
.
.
Some coversations with the chatbot are shown below: