sygnif-lora-v2

LoRA adapter trained on Qwen/Qwen3.5-9B for tool-calling in Hermes <tool_call>{...}</tool_call> format, with a small voice-replay slice from a SYGNIF crypto trading agent's channeler corpus.

The adapter teaches structured function-calling grammar; specific tool names are provided at inference time via the system prompt's <tools>[…]</tools> block, not learned. This is the standard Hermes-style FC convention.

Training summary

Field	Value
Base model	`Qwen/Qwen3.5-9B` (Apache 2.0, ungated)
Method	QLoRA (4-bit NF4 base, double-quant) + LoRA r=16
Trainable params	29,097,984 / 8,982,901,248 = 0.32 %
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Effective batch	16 (per-device 1 × grad-accum 16)
Max seq	1536 tokens
Learning rate	2e-4, cosine schedule
Epochs	2
Steps	498
Hardware	1× RTX 4090 24 GB
Wall time	5 h 40 min

Final metrics

Metric	Start (step 10)	End (step 490)	Best
Train loss	1.0879	0.3600	0.3276 (step 360)
Mean token accuracy	75.85 %	89.51 %	90.01 % (step 360)

Slight uptick at the very end (loss 0.33 → 0.36 over the last 80 steps) is LR-schedule-tail noise; the model is at convergence by step ~360.

Training corpus (3,969 rows, ChatML)

Slice	Rows	Source
Single-turn FC	~2,000	`lockon/xlam-function-calling-60k` (CC-BY-4.0 mirror of gated `Salesforce/xlam-function-calling-60k`)
Multi-turn FC + tool role	~1,500	`NousResearch/hermes-function-calling-v1` (Apache 2.0)
Voice replay	472	SYGNIF channeler corpus (private — your data)

All rows normalized to ChatML messages format with <tool_call>{...}</tool_call> in assistant turns and <tool_response>{...}</tool_response> in tool turns.

Files

File	Purpose
`adapter_model.safetensors`	PEFT LoRA weights, 56 MB
`adapter_config.json`	PEFT config (r, alpha, target_modules, etc.)
`adapter_metadata.json`	Reproducibility sidecar — full hyperparams + train timestamp
`chat_template.jinja`	Qwen 3.5 chat template (incl. tool role)
`tokenizer.json` + `tokenizer_config.json`	Tokenizer files
`sygnif-lora-v2.gguf`	Same adapter in llama.cpp GGUF format, 56 MB — for `llama-server --lora`

Usage

transformers + peft

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B", torch_dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-9B")
model = PeftModel.from_pretrained(base, "gianson/sygnif-lora-v2")

messages = [
    {"role": "system", "content": "You are a tool-using assistant. Use tools when asked for live data.\n\n<tools>\n[...your tool schemas...]\n</tools>"},
    {"role": "user",   "content": "What's BTC's current price?"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
out = model.generate(inputs, max_new_tokens=300)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=False))
# → <tool_call>{"name":"btc_ticker","arguments":{"symbol":"BTCUSDT"}}</tool_call>

llama.cpp / llama-server

llama-server \
  --model /path/to/Qwen3.5-9B-Q4_K_M.gguf \
  --lora  /path/to/sygnif-lora-v2.gguf \
  --jinja --ctx-size 4096 --port 8080 --n-gpu-layers 99

Then hit /v1/chat/completions (OpenAI-compatible) — the model emits tool_calls=[{...}] per the Hermes convention.

Verified behavior

Emits structured tool_calls arrays, not free-form text or hallucinated tool names.
Skips tool-calling on conversational queries ("evaluate your skills") — uses the negatives in xlam to hold the line.
Reasoning trace via Qwen 3.5 thinking-mode in reasoning_content field.
Final response throughput ~113 tok/s on RTX 4090, ~3 tok/s on Intel CPU (Q4 base).

Known limitations

No prior on specific tool names. The training corpus uses generic FC tool names; for your specific tools (e.g. btc.ticker, chain.balance), provide them in the system prompt's <tools>[…]</tools> block. The model dispatches what's there.
Trained at 4-bit base (NF4 + double-quant) — bf16 retraining at r=32 would likely add ~1–2 pp accuracy. Skipped here because no 48 GB GPU was in stock at training time.

License

Apache 2.0 — same as the base model. Trained on:

xlam-function-calling-60k: CC-BY-4.0 (attribution to Salesforce/lockon)
hermes-function-calling-v1: Apache 2.0 (NousResearch)
channeler replay slice: original data (the model author)

No restrictions on commercial use, redistribution, or further fine-tuning.

Citation

If you use this adapter:

@misc{sygnif-lora-v2,
  author = {Gianson},
  title = {sygnif-lora-v2: Hermes-format function-calling LoRA on Qwen 3.5 9B},
  year  = {2026},
  url   = {https://huggingface.co/gianson/sygnif-lora-v2}
}

Downloads last month: 71

GGUF

Model size

29.1M params

Architecture

qwen35

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gianson/sygnif-lora-v2

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Adapter

(170)

this model

gianson
/

sygnif-lora-v2