Finetuned TinyLLaMA Chat (4-bit, LoRA) β€” Healthcare Assistant

This model is a fine-tuned LoRA adapter based on the 4-bit quantized TinyLlama/TinyLlama-1.1B-Chat-v1.0, using the Unsloth framework and Hugging Face's PEFT library.

It is designed as a lightweight Healthcare assistant to help users ask common health-related questions and receive informative, conversational responses.

⚠️ Note: This model is currently in the build stage and was trained on a small synthetic dataset. It is intended for research and experimentation only β€” not for clinical use.


🧠 Model Details

  • Model Name: tinyllama-healthcare-4bit-lora
  • Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
  • Fine-tuned with: LoRA (Low-Rank Adaptation) + 4-bit quantization via Unsloth
  • Library: peft, transformers, unsloth
  • Language: English
  • License: Apache 2.0 (inherited from base)
  • Instruction Format: Medical question-answering (symptoms, conditions, advice)

🩺 Intended Use

βœ… Intended Applications

  • Patient education bots
  • General health Q&A assistants
  • Symptom explainer tools
  • On-device medical chatbots (informational)

❌ Not Intended For

  • Real-time clinical diagnosis or decision-making
  • Emergency or high-risk medical settings
  • Professional medical consultation replacement

⚠️ Bias, Risks, and Limitations

  • This model may hallucinate or produce oversimplified, incorrect advice
  • It was not trained on real clinical datasets
  • May reflect bias from public medical Q&A examples
  • Should not be trusted for personalized or critical health advice

πŸ’‘ Recommendations

  • Use only in low-stakes, informational or prototyping contexts
  • Always encourage users to consult healthcare professionals
  • Evaluate thoroughly before deployment in any production use case

πŸ› οΈ Training Details

Dataset

  • Format: JSONL with instruction, input, output
  • Size: 1000 examples
  • Split: 90% train / 10% eval
  • Max length: 230 tokens

LoRA Configuration

  • Target modules: q_proj, v_proj
  • r: 8, alpha: 16, dropout: 0.05
  • Gradient checkpointing: Enabled

Training Environment

  • Hardware: 1Γ— NVIDIA Tesla T4 (16GB)
  • Duration: ~2.9 minutes total
  • Precision: bfloat16 (fallback to fp16)
  • Framework: unsloth.FastLanguageModel + Hugging Face Trainer

πŸ“ˆ Evaluation

  • Final training loss: ~1.08
  • Evaluation loss: Low β€” but limited generalization expected
  • Benchmarks: Not evaluated on public QA or health benchmarks

Next Steps

  • Incorporate multi-turn dialog examples
  • Train with more realistic, diverse health datasets
  • Evaluate with medically focused test suites

🌱 Environmental Impact

  • GPU Used: Tesla T4
  • Training Time: < 3 minutes
  • Estimated Carbon Emission: ~0.05 kgCOβ‚‚eq (via ML CO2 calculator)

πŸ” Sources & Dependencies


πŸ“„ Citation

@misc{tinyllama_healthcare_2025,
  title={TinyLLaMA Healthcare Assistant via LoRA (Finetuned)},
  author={Dhivakar G},
  year={2025},
  howpublished={\url{https://huggingface.co/Dhivs/tinyllama-healthcare-4bit-lora}},
}

πŸ‘€ Model Contact


🧰 Framework Versions

  • peft: 0.16.0
  • transformers: 4.53.3
  • torch: 2.7.1+cu126
  • unsloth: 2025.7.8
  • CUDA Toolkit: 12.6

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Dhivs/tinyllama-healthcare-4bit-lora

Adapter
(1094)
this model