PEFT
Safetensors
GGUF
English
lora
qwen3.5
function-calling
tool-use
sygnif
hermes-format

sygnif-lora-v2

LoRA adapter trained on Qwen/Qwen3.5-9B for tool-calling in Hermes <tool_call>{...}</tool_call> format, with a small voice-replay slice from a SYGNIF crypto trading agent's channeler corpus.

The adapter teaches structured function-calling grammar; specific tool names are provided at inference time via the system prompt's <tools>[…]</tools> block, not learned. This is the standard Hermes-style FC convention.

Training summary

Field Value
Base model Qwen/Qwen3.5-9B (Apache 2.0, ungated)
Method QLoRA (4-bit NF4 base, double-quant) + LoRA r=16
Trainable params 29,097,984 / 8,982,901,248 = 0.32 %
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Effective batch 16 (per-device 1 × grad-accum 16)
Max seq 1536 tokens
Learning rate 2e-4, cosine schedule
Epochs 2
Steps 498
Hardware 1× RTX 4090 24 GB
Wall time 5 h 40 min

Final metrics

Metric Start (step 10) End (step 490) Best
Train loss 1.0879 0.3600 0.3276 (step 360)
Mean token accuracy 75.85 % 89.51 % 90.01 % (step 360)

Slight uptick at the very end (loss 0.33 → 0.36 over the last 80 steps) is LR-schedule-tail noise; the model is at convergence by step ~360.

Training corpus (3,969 rows, ChatML)

Slice Rows Source
Single-turn FC ~2,000 lockon/xlam-function-calling-60k (CC-BY-4.0 mirror of gated Salesforce/xlam-function-calling-60k)
Multi-turn FC + tool role ~1,500 NousResearch/hermes-function-calling-v1 (Apache 2.0)
Voice replay 472 SYGNIF channeler corpus (private — your data)

All rows normalized to ChatML messages format with <tool_call>{...}</tool_call> in assistant turns and <tool_response>{...}</tool_response> in tool turns.

Files

File Purpose
adapter_model.safetensors PEFT LoRA weights, 56 MB
adapter_config.json PEFT config (r, alpha, target_modules, etc.)
adapter_metadata.json Reproducibility sidecar — full hyperparams + train timestamp
chat_template.jinja Qwen 3.5 chat template (incl. tool role)
tokenizer.json + tokenizer_config.json Tokenizer files
sygnif-lora-v2.gguf Same adapter in llama.cpp GGUF format, 56 MB — for llama-server --lora

Usage

transformers + peft

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B", torch_dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-9B")
model = PeftModel.from_pretrained(base, "gianson/sygnif-lora-v2")

messages = [
    {"role": "system", "content": "You are a tool-using assistant. Use tools when asked for live data.\n\n<tools>\n[...your tool schemas...]\n</tools>"},
    {"role": "user",   "content": "What's BTC's current price?"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
out = model.generate(inputs, max_new_tokens=300)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=False))
# → <tool_call>{"name":"btc_ticker","arguments":{"symbol":"BTCUSDT"}}</tool_call>

llama.cpp / llama-server

llama-server \
  --model /path/to/Qwen3.5-9B-Q4_K_M.gguf \
  --lora  /path/to/sygnif-lora-v2.gguf \
  --jinja --ctx-size 4096 --port 8080 --n-gpu-layers 99

Then hit /v1/chat/completions (OpenAI-compatible) — the model emits tool_calls=[{...}] per the Hermes convention.

Verified behavior

  • Emits structured tool_calls arrays, not free-form text or hallucinated tool names.
  • Skips tool-calling on conversational queries ("evaluate your skills") — uses the negatives in xlam to hold the line.
  • Reasoning trace via Qwen 3.5 thinking-mode in reasoning_content field.
  • Final response throughput ~113 tok/s on RTX 4090, ~3 tok/s on Intel CPU (Q4 base).

Known limitations

  • No prior on specific tool names. The training corpus uses generic FC tool names; for your specific tools (e.g. btc.ticker, chain.balance), provide them in the system prompt's <tools>[…]</tools> block. The model dispatches what's there.
  • Trained at 4-bit base (NF4 + double-quant) — bf16 retraining at r=32 would likely add ~1–2 pp accuracy. Skipped here because no 48 GB GPU was in stock at training time.

License

Apache 2.0 — same as the base model. Trained on:

  • xlam-function-calling-60k: CC-BY-4.0 (attribution to Salesforce/lockon)
  • hermes-function-calling-v1: Apache 2.0 (NousResearch)
  • channeler replay slice: original data (the model author)

No restrictions on commercial use, redistribution, or further fine-tuning.

Citation

If you use this adapter:

@misc{sygnif-lora-v2,
  author = {Gianson},
  title = {sygnif-lora-v2: Hermes-format function-calling LoRA on Qwen 3.5 9B},
  year  = {2026},
  url   = {https://huggingface.co/gianson/sygnif-lora-v2}
}
Downloads last month
71
GGUF
Model size
29.1M params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gianson/sygnif-lora-v2

Finetuned
Qwen/Qwen3.5-9B
Adapter
(170)
this model

Datasets used to train gianson/sygnif-lora-v2