Instructions to use astom-M/qwen3-4b-structured-output-lora-clean with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use astom-M/qwen3-4b-structured-output-lora-clean with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-4B-Instruct-2507")
model = PeftModel.from_pretrained(base_model, "astom-M/qwen3-4b-structured-output-lora-clean")

Transformers

How to use astom-M/qwen3-4b-structured-output-lora-clean with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="astom-M/qwen3-4b-structured-output-lora-clean")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("astom-M/qwen3-4b-structured-output-lora-clean", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use astom-M/qwen3-4b-structured-output-lora-clean with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "astom-M/qwen3-4b-structured-output-lora-clean"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "astom-M/qwen3-4b-structured-output-lora-clean",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/astom-M/qwen3-4b-structured-output-lora-clean

SGLang

How to use astom-M/qwen3-4b-structured-output-lora-clean with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "astom-M/qwen3-4b-structured-output-lora-clean" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "astom-M/qwen3-4b-structured-output-lora-clean",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "astom-M/qwen3-4b-structured-output-lora-clean" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "astom-M/qwen3-4b-structured-output-lora-clean",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use astom-M/qwen3-4b-structured-output-lora-clean with Docker Model Runner:
```
docker model run hf.co/astom-M/qwen3-4b-structured-output-lora-clean
```

qwen3-4b-structured-output-lora-v3 (FIXED)

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using standard PeFT + Transformers with 4-bit quantization.

⚠️ This repository contains LoRA adapter weights only. The base model must be loaded separately.

Version 3: Critical Template Alignment Fix

This version fixes the critical template mismatch that caused v1/v2 to output explanatory text:

Key Fixes

Template Alignment: add_generation_prompt=True (matches vLLM inference)
User-Ending Prompts: Training prompts end with user message (not assistant)
Response-Only Loss: Loss applied only to response part, prompt is masked
Proper Learning Rate: 2e-06 (stronger than v2's 5e-07)

Why v1/v2 Failed

v1/v2: Used add_generation_prompt=False during training
vLLM: Uses add_generation_prompt=True during inference
Result: Model saw different prompt formats → output explanatory text

v3 Results

Training loss: ~1.12-1.37 (vs v1/v2's ~1.96)
Expected: <1% explanatory text rate (vs v1's 28.7%, v2's 45.3%)

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: LoRA with 4-bit quantization (standard PeFT + Transformers)
Max sequence length: 512
Epochs: 1
Learning rate: 2e-06 (proper learning strength)
LoRA parameters: r=64, alpha=128
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Batch size: 2 × 8 (gradient accumulation) = 16 effective
Training loss: ~1.12-1.37 (final)
Training time: ~14 minutes on RTX 5090

Dataset

Source: u-10bei/structured_data_with_cot_dataset_512_v2
Preprocessing: Removed "Approach:" sections and "Output:" markers
Size: 3,933 examples → 3,736 train / 197 validation

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "astom-M/qwen3-4b-structured-output-lora-clean"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter)

# For vLLM inference (recommended):
# Use the standard inference notebook provided by competition organizers

Technical Details

Template Alignment Fix

Training (v3):

# Prompt: system + user messages only
prompt_text = tokenizer.apply_chat_template(
    prompt_messages,
    add_generation_prompt=True  # ← KEY FIX
)
# Response: assistant content (raw structured data)
# Labels: Mask prompt part, only learn response part

Inference (vLLM):

# Exactly matches training format
tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True  # ← Now aligned!
)

Sources & License

Base Model: Qwen/Qwen3-4B-Instruct-2507 (Apache 2.0)
Training Data: u-10bei/structured_data_with_cot_dataset_512_v2 (preprocessed)
LoRA Adapter: Apache 2.0 (same as base model)

Notes

Trained for Matsuo Institute LLM Course Main Competition (StructEval-T)
Version 3: Fixed template alignment - critical fix for structured output
Designed to output clean structured data without explanatory text
Best used with temperature=0.0 for deterministic outputs

Framework Versions

PEFT 0.18.1
Transformers 4.56.2
PyTorch 2.10.0+cu128

Downloads last month: 1

Model tree for astom-M/qwen3-4b-structured-output-lora-clean

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

unsloth/Qwen3-4B-Instruct-2507

Adapter

(409)

this model