Text Generation
PEFT
Safetensors
Transformers
English
lora
structured-output
json
yaml
xml
structeval
Instructions to use astom-M/qwen3-4b-structured-output-lora-clean with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use astom-M/qwen3-4b-structured-output-lora-clean with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3-4B-Instruct-2507") model = PeftModel.from_pretrained(base_model, "astom-M/qwen3-4b-structured-output-lora-clean") - Transformers
How to use astom-M/qwen3-4b-structured-output-lora-clean with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="astom-M/qwen3-4b-structured-output-lora-clean")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("astom-M/qwen3-4b-structured-output-lora-clean", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use astom-M/qwen3-4b-structured-output-lora-clean with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "astom-M/qwen3-4b-structured-output-lora-clean" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "astom-M/qwen3-4b-structured-output-lora-clean", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/astom-M/qwen3-4b-structured-output-lora-clean
- SGLang
How to use astom-M/qwen3-4b-structured-output-lora-clean with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "astom-M/qwen3-4b-structured-output-lora-clean" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "astom-M/qwen3-4b-structured-output-lora-clean", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "astom-M/qwen3-4b-structured-output-lora-clean" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "astom-M/qwen3-4b-structured-output-lora-clean", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use astom-M/qwen3-4b-structured-output-lora-clean with Docker Model Runner:
docker model run hf.co/astom-M/qwen3-4b-structured-output-lora-clean
qwen3-4b-structured-output-lora-v3 (FIXED)
This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using standard PeFT + Transformers with 4-bit quantization.
โ ๏ธ This repository contains LoRA adapter weights only. The base model must be loaded separately.
Version 3: Critical Template Alignment Fix
This version fixes the critical template mismatch that caused v1/v2 to output explanatory text:
Key Fixes
- Template Alignment:
add_generation_prompt=True(matches vLLM inference) - User-Ending Prompts: Training prompts end with user message (not assistant)
- Response-Only Loss: Loss applied only to response part, prompt is masked
- Proper Learning Rate: 2e-06 (stronger than v2's 5e-07)
Why v1/v2 Failed
- v1/v2: Used
add_generation_prompt=Falseduring training - vLLM: Uses
add_generation_prompt=Trueduring inference - Result: Model saw different prompt formats โ output explanatory text
v3 Results
- Training loss: ~1.12-1.37 (vs v1/v2's ~1.96)
- Expected: <1% explanatory text rate (vs v1's 28.7%, v2's 45.3%)
Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Method: LoRA with 4-bit quantization (standard PeFT + Transformers)
- Max sequence length: 512
- Epochs: 1
- Learning rate: 2e-06 (proper learning strength)
- LoRA parameters: r=64, alpha=128
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Batch size: 2 ร 8 (gradient accumulation) = 16 effective
- Training loss: ~1.12-1.37 (final)
- Training time: ~14 minutes on RTX 5090
Dataset
- Source: u-10bei/structured_data_with_cot_dataset_512_v2
- Preprocessing: Removed "Approach:" sections and "Output:" markers
- Size: 3,933 examples โ 3,736 train / 197 validation
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "astom-M/qwen3-4b-structured-output-lora-clean"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter)
# For vLLM inference (recommended):
# Use the standard inference notebook provided by competition organizers
Technical Details
Template Alignment Fix
Training (v3):
# Prompt: system + user messages only
prompt_text = tokenizer.apply_chat_template(
prompt_messages,
add_generation_prompt=True # โ KEY FIX
)
# Response: assistant content (raw structured data)
# Labels: Mask prompt part, only learn response part
Inference (vLLM):
# Exactly matches training format
tokenizer.apply_chat_template(
messages,
add_generation_prompt=True # โ Now aligned!
)
Sources & License
- Base Model: Qwen/Qwen3-4B-Instruct-2507 (Apache 2.0)
- Training Data: u-10bei/structured_data_with_cot_dataset_512_v2 (preprocessed)
- LoRA Adapter: Apache 2.0 (same as base model)
Notes
- Trained for Matsuo Institute LLM Course Main Competition (StructEval-T)
- Version 3: Fixed template alignment - critical fix for structured output
- Designed to output clean structured data without explanatory text
- Best used with temperature=0.0 for deterministic outputs
Framework Versions
- PEFT 0.18.1
- Transformers 4.56.2
- PyTorch 2.10.0+cu128
- Downloads last month
- 1
Model tree for astom-M/qwen3-4b-structured-output-lora-clean
Base model
Qwen/Qwen3-4B-Instruct-2507 Finetuned
unsloth/Qwen3-4B-Instruct-2507