File size: 5,168 Bytes

---
base_model: mistralai/Mistral-7B-Instruct-v0.3
library_name: transformers
model_name: Doctor_AI_LoRA-Mistral-7B-Instructritvik77
tags:
- generated_from_trainer
- trl
- medical
- Doctor
- PEFT
- MEDICAL
- AIMEDICAL
- DOCTORai
licence: license
license: apache-2.0
datasets:
- FreedomIntelligence/medical-o1-reasoning-SFT
pipeline_tag: text-generation
---

# Model Card for Doctor_AI_LoRA-Mistral-7B-Instructritvik77

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3).
It has been trained using [TRL](https://github.com/huggingface/trl).

## Quick start

```python
# from peft import PeftModel, PeftConfig
# from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
# from datasets import load_dataset
# import torch

# # Quantization config for 4-bit loading
# bnb_config = BitsAndBytesConfig(
#     load_in_4bit=True,
#     bnb_4bit_quant_type="nf4",
#     bnb_4bit_compute_dtype=torch.bfloat16,
#     bnb_4bit_use_double_quant=True,
# )

# # Repo ID for the PEFT model
# peft_model_id = f"{username}/{output_dir}"  # e.g., ritvik77/Mixtral-7B-LoRA-Salesforce-Optimized-AI-AgentCall
# device = "auto"

# # Load PEFT config from the Hub
# config = PeftConfig.from_pretrained(peft_model_id)

# # Load the base model (e.g., Mistral-7B) with quantization
# model = AutoModelForCausalLM.from_pretrained(
#     config.base_model_name_or_path,  # Base model ID stored in PEFT config
#     device_map="auto",
#     quantization_config=bnb_config,  # Apply 4-bit quantization
# )

# # Load tokenizer from the PEFT model repo
# tokenizer = AutoTokenizer.from_pretrained(peft_model_id)

# # Resize token embeddings to match tokenizer (if needed)
# model.resize_token_embeddings(len(tokenizer))

# # Load PEFT adapters and apply them to the base model
# model = PeftModel.from_pretrained(model, peft_model_id)

# # Convert model to bfloat16 and set to evaluation mode
# model.to(torch.bfloat16)
# model.eval()

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel, PeftConfig

# ✅ Quantization config for 4-bit loading (Memory Optimization)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",           # ✅ Improved precision for LoRA weights
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,      # ✅ Reduces VRAM overhead
)

# ✅ Load tokenizer from fine-tuned checkpoint (Ensures token consistency)
peft_model_id = "ritvik77/Doctor_AI_LoRA-Mistral-7B-Instructritvik77"
tokenizer = AutoTokenizer.from_pretrained(peft_model_id, trust_remote_code=True)

# ✅ Ensure `pad_token` is correctly assigned
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# ✅ Load Base Model with Quantization for Memory Efficiency
model_name = "mistralai/Mistral-7B-Instruct-v0.3"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",                # ✅ Efficiently maps to available GPUs
    quantization_config=bnb_config,   # ✅ Efficient quantization for large models
    torch_dtype=torch.bfloat16
)

# ✅ Resize Token Embeddings BEFORE Loading LoRA Adapter (Prevents size mismatch)
model.resize_token_embeddings(len(tokenizer))

# ✅ Load PEFT Adapter (LoRA Weights)
model = PeftModel.from_pretrained(model, peft_model_id)

# ✅ Unfreeze LoRA layers to ensure they are trainable
for name, param in model.named_parameters():
    if "lora" in name:
        param.requires_grad = True

# ✅ Confirm LoRA Layers Are Active
if hasattr(model, 'print_trainable_parameters'):
    model.print_trainable_parameters()
else:
    print("❗ Warning: LoRA adapter may not have loaded correctly.")

# ✅ Ensure model is in evaluation mode for inference
model.eval()

# ✅ Sample Inference Code
def generate_response(prompt, max_new_tokens=300, temperature=0.7):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        do_sample=True,
        temperature=temperature
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# ✅ Sample Prompt for Medical Diagnosis
prompt = "Patient reports chest pain and shortness of breath. What might be the diagnosis?"
response = generate_response(prompt)
print("\n🩺 **Diagnosis:**", response)

print("🚀 PEFT model loaded successfully with resized embeddings!")





## Training procedure

 


This model was trained with SFT.

### Framework versions

- TRL: 0.15.2
- Transformers: 4.48.3
- Pytorch: 2.5.1+cu124
- Datasets: 3.3.2
- Tokenizers: 0.21.0

## Citations



Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```