Instructions to use gianson/sygnif-lora-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use gianson/sygnif-lora-v2 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B") model = PeftModel.from_pretrained(base_model, "gianson/sygnif-lora-v2") - Notebooks
- Google Colab
- Kaggle
sygnif-lora-v2
LoRA adapter trained on Qwen/Qwen3.5-9B for tool-calling in
Hermes <tool_call>{...}</tool_call>
format, with a small voice-replay slice from a SYGNIF crypto trading agent's
channeler corpus.
The adapter teaches structured function-calling grammar; specific tool
names are provided at inference time via the system prompt's <tools>[…]</tools>
block, not learned. This is the standard Hermes-style FC convention.
Training summary
| Field | Value |
|---|---|
| Base model | Qwen/Qwen3.5-9B (Apache 2.0, ungated) |
| Method | QLoRA (4-bit NF4 base, double-quant) + LoRA r=16 |
| Trainable params | 29,097,984 / 8,982,901,248 = 0.32 % |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Effective batch | 16 (per-device 1 × grad-accum 16) |
| Max seq | 1536 tokens |
| Learning rate | 2e-4, cosine schedule |
| Epochs | 2 |
| Steps | 498 |
| Hardware | 1× RTX 4090 24 GB |
| Wall time | 5 h 40 min |
Final metrics
| Metric | Start (step 10) | End (step 490) | Best |
|---|---|---|---|
| Train loss | 1.0879 | 0.3600 | 0.3276 (step 360) |
| Mean token accuracy | 75.85 % | 89.51 % | 90.01 % (step 360) |
Slight uptick at the very end (loss 0.33 → 0.36 over the last 80 steps) is LR-schedule-tail noise; the model is at convergence by step ~360.
Training corpus (3,969 rows, ChatML)
| Slice | Rows | Source |
|---|---|---|
| Single-turn FC | ~2,000 | lockon/xlam-function-calling-60k (CC-BY-4.0 mirror of gated Salesforce/xlam-function-calling-60k) |
| Multi-turn FC + tool role | ~1,500 | NousResearch/hermes-function-calling-v1 (Apache 2.0) |
| Voice replay | 472 | SYGNIF channeler corpus (private — your data) |
All rows normalized to ChatML messages format with <tool_call>{...}</tool_call>
in assistant turns and <tool_response>{...}</tool_response> in tool turns.
Files
| File | Purpose |
|---|---|
adapter_model.safetensors |
PEFT LoRA weights, 56 MB |
adapter_config.json |
PEFT config (r, alpha, target_modules, etc.) |
adapter_metadata.json |
Reproducibility sidecar — full hyperparams + train timestamp |
chat_template.jinja |
Qwen 3.5 chat template (incl. tool role) |
tokenizer.json + tokenizer_config.json |
Tokenizer files |
sygnif-lora-v2.gguf |
Same adapter in llama.cpp GGUF format, 56 MB — for llama-server --lora |
Usage
transformers + peft
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B", torch_dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-9B")
model = PeftModel.from_pretrained(base, "gianson/sygnif-lora-v2")
messages = [
{"role": "system", "content": "You are a tool-using assistant. Use tools when asked for live data.\n\n<tools>\n[...your tool schemas...]\n</tools>"},
{"role": "user", "content": "What's BTC's current price?"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
out = model.generate(inputs, max_new_tokens=300)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=False))
# → <tool_call>{"name":"btc_ticker","arguments":{"symbol":"BTCUSDT"}}</tool_call>
llama.cpp / llama-server
llama-server \
--model /path/to/Qwen3.5-9B-Q4_K_M.gguf \
--lora /path/to/sygnif-lora-v2.gguf \
--jinja --ctx-size 4096 --port 8080 --n-gpu-layers 99
Then hit /v1/chat/completions (OpenAI-compatible) — the model emits
tool_calls=[{...}] per the Hermes convention.
Verified behavior
- Emits structured
tool_callsarrays, not free-form text or hallucinated tool names. - Skips tool-calling on conversational queries ("evaluate your skills") — uses the negatives in xlam to hold the line.
- Reasoning trace via Qwen 3.5 thinking-mode in
reasoning_contentfield. - Final response throughput ~113 tok/s on RTX 4090, ~3 tok/s on Intel CPU (Q4 base).
Known limitations
- No prior on specific tool names. The training corpus uses generic FC tool names; for your specific tools (e.g.
btc.ticker,chain.balance), provide them in the system prompt's<tools>[…]</tools>block. The model dispatches what's there. - Trained at 4-bit base (NF4 + double-quant) — bf16 retraining at r=32 would likely add ~1–2 pp accuracy. Skipped here because no 48 GB GPU was in stock at training time.
License
Apache 2.0 — same as the base model. Trained on:
- xlam-function-calling-60k: CC-BY-4.0 (attribution to Salesforce/lockon)
- hermes-function-calling-v1: Apache 2.0 (NousResearch)
- channeler replay slice: original data (the model author)
No restrictions on commercial use, redistribution, or further fine-tuning.
Citation
If you use this adapter:
@misc{sygnif-lora-v2,
author = {Gianson},
title = {sygnif-lora-v2: Hermes-format function-calling LoRA on Qwen 3.5 9B},
year = {2026},
url = {https://huggingface.co/gianson/sygnif-lora-v2}
}
- Downloads last month
- 71
We're not able to determine the quantization variants.