Ornstein3.6-35B-A3B-RYS

Ornstein3.6-35B-A3B-RYS-GGUF

GGUF quantizations of DJLougen/Ornstein3.6-35B-A3B-RYS — the RYS-enhanced Ornstein fine-tune with layer 10 duplicated for a +49% reasoning improvement.

Full-precision model: DJLougen/Ornstein3.6-35B-A3B-RYS | Uncensored version: DJLougen/Ornstein3.6-35B-A3B-RYS-SABER

Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

Support on Ko-fi


Available Quantizations

Quantization Use Case
Q8_0 Best quality, highest memory
Q6_K Near-lossless, good for 48GB+ VRAM
Q5_K_M Excellent quality/size balance
Q5_K_S Slightly smaller Q5
Q5_0 Legacy Q5 format
Q4_K_M Recommended default for 24GB VRAM
Q4_K_S Smaller Q4 variant
Q4_0 Legacy Q4 format
Q3_K_L Low memory, acceptable quality
Q3_K_M Lower memory
Q3_K_S Aggressive 3-bit
Q2_K Minimum viable quality

Model Details

  • Architecture: Qwen 3.6 MoE (35B total, ~3B active per token)
  • Layers: 41 (40 original + 1 RYS-duplicated layer 10)
  • Context: 262,144 tokens
  • RYS improvement: +139% math, +7.2% instruction following, +49% combined

Usage

llama.cpp

llama-cli -m Ornstein3.6-35B-A3B-RYS-Q4_K_M.gguf -p "Your prompt here" -ngl 99

Ollama

ollama run hf.co/DJLougen/Ornstein3.6-35B-A3B-RYS-GGUF:Q4_K_M

License

Apache 2.0

Downloads last month
326
GGUF
Model size
36B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DJLougen/Ornstein3.6-35B-A3B-RYS-GGUF