Instructions to use apol/med-llm-triage-es with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use apol/med-llm-triage-es with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="apol/med-llm-triage-es",
	filename="med-llm-es-triage-FP16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use apol/med-llm-triage-es with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf apol/med-llm-triage-es:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf apol/med-llm-triage-es:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf apol/med-llm-triage-es:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf apol/med-llm-triage-es:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf apol/med-llm-triage-es:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf apol/med-llm-triage-es:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf apol/med-llm-triage-es:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf apol/med-llm-triage-es:Q4_K_M

Use Docker

docker model run hf.co/apol/med-llm-triage-es:Q4_K_M

LM Studio
Jan
Ollama
How to use apol/med-llm-triage-es with Ollama:
```
ollama run hf.co/apol/med-llm-triage-es:Q4_K_M
```

Unsloth Studio new

How to use apol/med-llm-triage-es with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for apol/med-llm-triage-es to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for apol/med-llm-triage-es to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for apol/med-llm-triage-es to start chatting

Pi new

How to use apol/med-llm-triage-es with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf apol/med-llm-triage-es:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "apol/med-llm-triage-es:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use apol/med-llm-triage-es with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf apol/med-llm-triage-es:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default apol/med-llm-triage-es:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use apol/med-llm-triage-es with Docker Model Runner:
```
docker model run hf.co/apol/med-llm-triage-es:Q4_K_M
```

Lemonade

How to use apol/med-llm-triage-es with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull apol/med-llm-triage-es:Q4_K_M

Run and chat with the model

lemonade run user.med-llm-triage-es-Q4_K_M

List all available models

lemonade list

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

med-llm-es: Spanish Medical Triage LLM

End-to-end pipeline to build a fine-tuned Spanish medical triage model for offline/edge deployment.

⚠️ MEDICAL DISCLAIMER - IMPORTANT

THIS PROJECT IS FOR EDUCATIONAL AND RESEARCH PURPOSES ONLY.

❌ This is NOT a medical device
❌ This does NOT provide medical advice
❌ Do NOT use for actual patient triage
❌ The model may produce incorrect, incomplete, or harmful outputs

Required Actions:

Always recommend professional medical consultation
In real emergencies, call emergency services (112 in Europe, 911 in US)
This model should only be used for learning about LLM fine-tuning
Researchers: validate thoroughly before any downstream applications
Deployers: assume full liability for any use cases

Project Status

Phase	Status	Details
Data Preparation	✅ Complete	5000+ Spanish medical prompts
Continued Pre-Training (CPT)	✅ Complete	Medical domain adaptation
Supervised Fine-Tuning (SFT)	✅ Complete	Triage instruction tuning
Knowledge Distillation	✅ Complete	MiniMax-M2.5 teacher outputs
GRPO Training	✅ Complete	Reward-based optimization
DPO Training	✅ Complete	Preference alignment
GGUF Quantization	✅ Complete	Multiple quantization levels

What This Project Provides

Working Models

Model File	Size	Use Case
med-llm-es-triage-balanced-Q5_K_M.gguf	~800MB	Recommended - Best quality/size balance
med-llm-es-triage-balanced-Q4_K_M.gguf	~700MB	Mobile devices
med-llm-es-triage-balanced-Q2_K.gguf	~460MB	Low-resource devices

Training Datasets (Available)

Distilled data: 5000+ examples from MiniMax-M2.5
Preference data: 10K+ DPO training pairs
Balanced data: Enhanced training sets

Technical Achievements

Full RLHF pipeline (CPT → SFT → GRPO → DPO)
Offline-capable quantized models
Spanish medical language specialization
Manchester Triage System (MTS) implementation

Use Cases (Educational)

This project demonstrates how to:

Build domain-specific LLMs - Medical Spanish fine-tuning
Implement knowledge distillation - Using powerful teacher models
Apply RLHF techniques - GRPO and DPO for alignment
Optimize for edge deployment - GGUF quantization
Create safety-aligned models - Medical disclaimers and urgency levels

Pipeline

┌─────────────────────────────────────────────────────────────────────────┐
│  1. Data Prep      →  2. CPT       →  3. SFT       →  4. Distill    │
│  OpenMed + MTS      Spanish Med     Triage SFT      MiniMax-M2.5      │
│                                                                         │
│  5. GRPO        →  6. DPO       →  7. Quantize  →  8. Deploy       │
│  Rewards          Preference      GGUF Q5        Offline App         │
└─────────────────────────────────────────────────────────────────────────┘

Directory Structure

med-llm-es/
├── configs/
│   └── config.py              # Configuration settings
├── data/
│   ├── raw/                   # Downloaded OpenMed datasets
│   ├── translated/            # Spanish translations
│   ├── triage/                # Generated triage prompts
│   ├── distilled/             # Teacher-generated data (~10MB)
│   └── preference/            # DPO preference pairs (~10MB)
├── models/
│   ├── cpt-spanish-medical-v1/    # CPT model
│   ├── sft-spanish-triage-v1/     # SFT model
│   ├── grpo-spanish-triage-v1/    # GRPO model
│   ├── dpo-spanish-triage-v1/     # DPO model
│   └── gguf/                      # Quantized models (~2GB total)
├── scripts/
│   ├── 01_download_opendmed.py      # Download datasets
│   ├── 02_translate_to_spanish.py   # Translate to Spanish
│   ├── 03_generate_triage_data.py   # Create triage prompts
│   ├── 04_cpt_spanish_medical.py    # Continued Pre-Training
│   ├── 05_sft_triage.py             # Supervised Fine-Tuning
│   ├── 06_distillation_generate.py  # Knowledge Distillation
│   ├── 07_create_preference_data.py # Create DPO dataset
│   ├── 08_grpo_triage.py            # GRPO training
│   ├── 09_dpo_triage.py             # DPO training
│   ├── 10_quantize_gguf.py          # Quantization
│   └── 11_monitor_grpo.py           # Passive GRPO run monitor
├── checkpoints/               # Training checkpoints
├── reports/                   # Documentation
├── DEPLOYMENT_GUIDES.md      # Edge deployment instructions
└── README.md

Quick Start

Prerequisites

Google Colab Pro (for A100 GPU access) or local GPU (16GB+ VRAM)
MiniMax API Key (for distillation)
Google Drive (for storage)

Execution Order

Data Preparation

python scripts/01_download_opendmed.py
python scripts/02_translate_to_spanish.py
python scripts/03_generate_triage_data.py

Training (on Colab)

python scripts/04_cpt_spanish_medical.py  # CPT
python scripts/05_sft_triage.py           # SFT
python scripts/06_distillation_generate.py # Distillation
python scripts/07_create_preference_data.py # Preference data
python scripts/08_grpo_triage.py          # GRPO
python scripts/09_dpo_triage.py           # DPO

Quantization
```
python scripts/10_quantize_gguf.py
```

Triage System

Uses Manchester Triage System (MTS):

Level	Color	Meaning	Response Time
ROJO	Red	Emergency	Immediate
NARANJA	Orange	Very Urgent	10 min
AMARILLO	Yellow	Urgent	60 min
VERDE	Green	Less Urgent	120 min
AZUL	Blue	Non-urgent	240 min

Configuration

Edit configs/config.py:

BASE_MODEL = "LiquidAI/LFM2.5-1.2B-Base"
TEACHER_MODEL = "MiniMaxAI/MiniMax-M2.5"
MINIMAX_API_KEY = "your-api-key-here"

# Paths (use your drive)
DATA_DIR = "E:/med-llm-es/data"
MODELS_DIR = "E:/med-llm-es/models"

Deployment

See DEPLOYMENT_GUIDES.md for:

Android (Termux)
iOS (MLX)
Desktop
Raspberry Pi

Cost Estimate

Item	Cost
Colab Pro (80 hours)	~$100-150
MiniMax API (distillation)	~$50-100
Total	~$150-250

Limitations & Risks

Model may hallucinate - Incorrect medical information
Limited training data - Not comprehensive medical coverage
No clinical validation - Never tested in real settings
Language bias - Trained on specific Spanish variants
Quantization losses - Accuracy trade-offs from compression

License

This project is for educational/research purposes only.

Base models: LFM2.5 (Liquid AI), MiniMax-M2.5 (MiniMax)
Training: Apache 2.0 / TRL

Acknowledgments

OpenMed - Medical datasets
Liquid AI - LFM2.5 models
MiniMax - Teacher model
TRL - Training library
Unsloth - Efficient fine-tuning

Downloads last month: 73

GGUF

Model size

1B params

Architecture

lfm2

Hardware compatibility

2-bit

4-bit

5-bit

View +3 variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

apol
/

med-llm-triage-es