HIKARI-Sirius-8B-SkinDx-RAG ⭐
Healthcare-oriented Intelligent Knowledge Augmented Retrieval and Inference
Named after Sirius — the brightest star in the night sky
📦 Model Type: Merged Full Model
This is a fully merged model — the LoRA adapter weights have been merged directly into the base model weights.
✅ No adapter loading needed. Load and run directly with
transformers,vLLM, orSGLang, just like any standard Qwen3-VL model.💾 Size: ~17 GB (4 safetensor shards)
🔌 If you prefer a lightweight adapter instead, see the LoRA version: E27085921/HIKARI-Sirius-8B-SkinDx-RAG-LoRA (~1.1 GB)
Overview
HIKARI-Sirius is our best-performing skin disease diagnosis model, fine-tuned from Qwen/Qwen3-VL-8B-Thinking on the SkinCAP Thai dermatology dataset.
The key innovation is RAG-in-Training — retrieval-augmented generation is embedded during fine-tuning itself (not only at inference). The model learns to compare a query image against retrieved reference images and their clinical captions, making it robust to visual similarity across diseases.
| Property | Value |
|---|---|
| Task | 10-class skin disease diagnosis (Stage 2 of HIKARI pipeline) |
| Base model | Qwen/Qwen3-VL-8B-Thinking |
| Training technique | RAG-in-Training (R2: SigLIP visual + BGE-M3 text, α=0.9) |
| Val accuracy | 85.86% (99 samples, SkinCAP 3-stage split) |
| Model type | Merged full model |
| Hardware tested | RTX 5070 Ti (16 GB VRAM) |
🩺 Disease Classes (10)
| Class | Description |
|---|---|
acne_vulgaris |
Acne — comedones, papules, pustules on face/back |
atopic_dermatitis |
Eczema — chronic pruritic inflammatory skin disease |
melanocytic_nevi |
Moles — benign melanocyte proliferations |
psoriasis |
Erythematous plaques with silvery-white scale |
sccis |
Squamous cell carcinoma in situ (Bowen's disease) |
seborrheic_dermatitis |
Dandruff-related scaly patches on oily areas |
skin_tag |
Benign soft fibroepithelial pedunculated growths |
tinea_versicolor |
Fungal discoloration (hypo/hyperpigmented macules) |
urticaria |
Hives — transient wheals with erythema |
photodermatoses |
Sun-induced skin reactions |
📊 Performance vs Baselines
| Model | Accuracy | Training Method |
|---|---|---|
| Qwen3-VL-8B zero-shot | ~45% | No fine-tuning |
| HIKARI-Altair (Single FT) | 74.00% | Standard fine-tuning |
| HIKARI-Deneb (Cascade FT) | 79.80% | Cascaded pretraining |
| HIKARI-Sirius (RAG-in-Training) | 85.86% ✨ | RAG embedded at training |
🔧 Usage
Stage 2 in the Full HIKARI Pipeline
📷 Image
│
▼
[Stage 1] HIKARI-Subaru-8B-SkinGroup ──► group label (4 classes)
│
▼
[Stage 2] HIKARI-Sirius-8B-SkinDx-RAG ──► disease label (10 classes) ← YOU ARE HERE
│
▼
[Stage 3] HIKARI-Vega-8B-SkinCaption-Fused ──► clinical caption
Quick Inference — transformers
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image
model_id = "E27085921/HIKARI-Sirius-8B-SkinDx-RAG"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = Qwen3VLForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
image = Image.open("skin_lesion.jpg").convert("RGB")
group = "inflammatory" # from Stage 1 (HIKARI-Subaru)
PROMPT = (
"This skin lesion belongs to the group '{group}'. "
"Examine the lesion morphology (papules, plaques, macules), "
"color (red, violet, white, brown), scale/crust, border sharpness, "
"and distribution pattern. Based on these visual features, "
"what is the specific skin disease?"
)
messages = [{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": PROMPT.format(group=group)},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=64, temperature=0.0, do_sample=False)
result = processor.batch_decode(
out[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True
)[0].strip()
print(result) # → "atopic_dermatitis"
Production — vLLM BnB-4bit ⚡ (RTX 5070 Ti / 16 GB VRAM)
Throughput: 5.57 img/s at batch=4
from vllm import LLM, SamplingParams
from transformers import AutoProcessor
from PIL import Image
model_id = "E27085921/HIKARI-Sirius-8B-SkinDx-RAG"
llm = LLM(
model=model_id,
quantization="bitsandbytes",
load_format="bitsandbytes",
trust_remote_code=True,
max_model_len=2048,
gpu_memory_utilization=0.88,
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
sp = SamplingParams(max_tokens=64, temperature=0.0)
PROMPT = (
"This skin lesion belongs to the group '{group}'. "
"Examine the lesion morphology (papules, plaques, macules), "
"color (red, violet, white, brown), scale/crust, border sharpness, "
"and distribution pattern. Based on these visual features, "
"what is the specific skin disease?"
)
def classify_disease(image: Image.Image, group: str) -> str:
messages = [{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": PROMPT.format(group=group)},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
n = max(text.count("<|vision_start|>"), 1)
out = llm.generate({"prompt": text, "multi_modal_data": {"image": [image] * n}}, sp)
return out[0].outputs[0].text.strip()
img = Image.open("skin_lesion.jpg").convert("RGB")
print(classify_disease(img, group="inflammatory")) # → "atopic_dermatitis"
Production — SGLang FP8 🚀 (maximum throughput, 9.11 img/s at batch=4)
import sglang as sgl
from transformers import AutoProcessor
from PIL import Image
model_id = "E27085921/HIKARI-Sirius-8B-SkinDx-RAG"
engine = sgl.Engine(
model_path=model_id,
dtype="bfloat16",
quantization="fp8",
context_length=2048,
mem_fraction_static=0.88,
trust_remote_code=True,
disable_cuda_graph=True,
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
PROMPT = (
"This skin lesion belongs to the group '{group}'. "
"Examine the lesion morphology (papules, plaques, macules), "
"color (red, violet, white, brown), scale/crust, border sharpness, "
"and distribution pattern. Based on these visual features, "
"what is the specific skin disease?"
)
def classify_disease_sglang(image: Image.Image, group: str) -> str:
messages = [{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": PROMPT.format(group=group)},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
out = engine.generate(
prompt=text,
image_data=image,
sampling_params={"max_new_tokens": 64, "temperature": 0.0},
)
return (out["text"] if isinstance(out, dict) else out[0]["text"]).strip()
# engine.shutdown() # call when done
Parse Disease Label (fuzzy matching)
from rapidfuzz import process as fuzz_process
DISEASES = [
"acne_vulgaris", "atopic_dermatitis", "melanocytic_nevi", "psoriasis",
"sccis", "seborrheic_dermatitis", "skin_tag", "tinea_versicolor",
"urticaria", "photodermatoses",
]
def match_disease(raw: str) -> str:
result, score, _ = fuzz_process.extractOne(raw.lower(), DISEASES)
return result if score >= 50 else "unknown"
print(match_disease("The patient has atopic dermatitis")) # → atopic_dermatitis
⚡ Speed Benchmark (RTX 5070 Ti, 16 GB VRAM — Stage 2, 64-token output)
| Engine | Batch 1 | Batch 4 | vs Unsloth bs=1 |
|---|---|---|---|
| Unsloth 4-bit | 1,096 ms/img | 500 ms/img | baseline |
| vLLM BnB-4bit | 480 ms/img | 179 ms/img | 6.1× faster |
| SGLang FP8 | 331 ms/img | 110 ms/img ⚡ | 10× faster |
🔌 LoRA Adapter Version
Prefer a lightweight adapter (~1.1 GB) over the 17 GB merged model?
from peft import PeftModel
from transformers import Qwen3VLForConditionalGeneration
import torch
base = Qwen3VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen3-VL-8B-Thinking", torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "E27085921/HIKARI-Sirius-8B-SkinDx-RAG-LoRA")
→ E27085921/HIKARI-Sirius-8B-SkinDx-RAG-LoRA
🌟 HIKARI Model Family
| Model | Task | Metric | Type |
|---|---|---|---|
| HIKARI-Subaru-8B-SkinGroup | 4-class group classifier (Stage 1) | 88.68% | Merged |
| HIKARI-Altair-8B-SkinDx | 10-class disease dx — baseline | 74.00% | Merged + LoRA |
| HIKARI-Deneb-8B-SkinDx-Cascade | 10-class disease dx — cascade FT | 79.80% | Merged + LoRA |
| ⭐ HIKARI-Sirius-8B-SkinDx-RAG (this model) | 10-class disease dx — RAG-in-Training | 85.86% | Merged + LoRA |
| HIKARI-Polaris-8B-SkinDx-Oracle | Oracle upper bound (research only) | 59.38%* | Merged |
| HIKARI-Rigel-8B-SkinCaption | Clinical caption — checkpoint init | BLEU-4: 9.82 | Merged + LoRA |
| ⭐ HIKARI-Vega-8B-SkinCaption-Fused | Clinical caption — merged init (best) | BLEU-4: 29.33 | Merged + LoRA |
| HIKARI-Antares-8B-SkinCaption-STS | Caption + STS ablation (research) | BLEU-4: 0.61 | Merged + LoRA |
* Polaris requires ground-truth group at inference — for research comparison only.
📄 Citation
@misc{hikari2026,
title = {HIKARI: RAG-in-Training for Skin Disease Diagnosis
with Cascaded Vision-Language Models},
author = {Watin Promfiy and Pawitra Boonprasart},
year = {2026},
institution = {King Mongkut's Institute of Technology Ladkrabang,
Department of Information Technology, Bangkok, Thailand}
}
Made with ❤️ at King Mongkut's Institute of Technology Ladkrabang (KMITL)
Department of Information Technology
- Downloads last month
- 148