Finetuned TinyLLaMA Chat (4-bit, LoRA) — Healthcare Assistant

This model is a fine-tuned LoRA adapter based on the 4-bit quantized TinyLlama/TinyLlama-1.1B-Chat-v1.0, using the Unsloth framework and Hugging Face's PEFT library.

It is designed as a lightweight Healthcare assistant to help users ask common health-related questions and receive informative, conversational responses.

⚠️ Note: This model is currently in the build stage and was trained on a small synthetic dataset. It is intended for research and experimentation only — not for clinical use.

🧠 Model Details

Model Name: tinyllama-healthcare-4bit-lora
Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Fine-tuned with: LoRA (Low-Rank Adaptation) + 4-bit quantization via Unsloth
Library: peft, transformers, unsloth
Language: English
License: Apache 2.0 (inherited from base)
Instruction Format: Medical question-answering (symptoms, conditions, advice)

🩺 Intended Use

✅ Intended Applications

Patient education bots
General health Q&A assistants
Symptom explainer tools
On-device medical chatbots (informational)

❌ Not Intended For

Real-time clinical diagnosis or decision-making
Emergency or high-risk medical settings
Professional medical consultation replacement

⚠️ Bias, Risks, and Limitations

This model may hallucinate or produce oversimplified, incorrect advice
It was not trained on real clinical datasets
May reflect bias from public medical Q&A examples
Should not be trusted for personalized or critical health advice

💡 Recommendations

Use only in low-stakes, informational or prototyping contexts
Always encourage users to consult healthcare professionals
Evaluate thoroughly before deployment in any production use case

🛠️ Training Details

Dataset

Format: JSONL with instruction, input, output
Size: 1000 examples
Split: 90% train / 10% eval
Max length: 230 tokens

LoRA Configuration

Target modules: q_proj, v_proj
r: 8, alpha: 16, dropout: 0.05
Gradient checkpointing: Enabled

Training Environment

Hardware: 1× NVIDIA Tesla T4 (16GB)
Duration: ~2.9 minutes total
Precision: bfloat16 (fallback to fp16)
Framework: unsloth.FastLanguageModel + Hugging Face Trainer

📈 Evaluation

Final training loss: ~1.08
Evaluation loss: Low — but limited generalization expected
Benchmarks: Not evaluated on public QA or health benchmarks

Next Steps

Incorporate multi-turn dialog examples
Train with more realistic, diverse health datasets
Evaluate with medically focused test suites

🌱 Environmental Impact

GPU Used: Tesla T4
Training Time: < 3 minutes
Estimated Carbon Emission: ~0.05 kgCO₂eq (via ML CO2 calculator)

🔍 Sources & Dependencies

Base model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Adapter: Unsloth
PEFT library: HuggingFace PEFT

📄 Citation

@misc{tinyllama_healthcare_2025,
  title={TinyLLaMA Healthcare Assistant via LoRA (Finetuned)},
  author={Dhivakar G},
  year={2025},
  howpublished={\url{https://huggingface.co/Dhivs/tinyllama-healthcare-4bit-lora}},
}

👤 Model Contact

Author: Dhivakar G
HF Username: Dhivs
Contact: [email protected]

🧰 Framework Versions

peft: 0.16.0
transformers: 4.53.3
torch: 2.7.1+cu126
unsloth: 2025.7.8
CUDA Toolkit: 12.6

Dhivs
/

tinyllama-healthcare-4bit-lora