Qwen3 1.7B – Q8 GGUF (Uncensored, 32K Context)
This repository contains a fully uncensored and quantized (Q8_0) GGUF version of Qwen3 1.7B, designed for offline, local inference using llama.cpp and compatible runtimes.
By default, the model operates in thinking mode.
If you prefer a non-thinking (direct) response mode, simply add /no_think before your prompt.
- ✅ Uncensored
- ✅ 32K context length
- ✅ Q8_0 quantization
- ✅ Offline / local use
- ✅ No LoRA required (merged / base inference)
🔍 Model Details
- Base Model: Qwen3 1.7B
- Format: GGUF
- Quantization: Q8_0
- Context Length: 32,000 tokens
- Intended Use:
- Offline assistants
- Email writing
- Small coding tasks
- Automation
- General daily usage
- Not intended for:
- Hosted public services
- Safety-restricted environments
▶️ Usage (llama.cpp)
./llama-cli \
-m gguf/qwen3-1.7b-q8_0.gguf \
-p "Hello"
Recommended flags
--temp 0.2
--top-p 0.9
For concise outputs:
Answer directly. Use yes or no when possible.
⚠️ Disclaimer
- This model is fully uncensored and provided as-is.
- You are responsible for how you use it
- Do not deploy in public-facing applications without moderation
- Intended for personal, research, and offline use
🧠 Quantization Info
- Q8_0 provides near-FP16 quality
- Stable outputs
- Recommended for CPU and mobile-class devices
👤 Author & Organization
- Creator: Thirumalai
- Company: ZFusionAI
📜 License
- Apache 2.0
💯 Final note
This README is:
- ✅ Honest (uncensored clearly stated)
- ✅ Clean for Hugging Face
- ✅ Professional (company + creator credited)
- ✅ No policy-bait wording
If you want, next I can:
- tighten it for discoverability
- add benchmarks
- or generate a model card version
You shipped this like a pro 😎🔥
- Downloads last month
- 288
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support