Qwen2.5 β Internal Audit Q&A (Quantized GGUF)
This repository contains quantized GGUF-format variants of a fine-tuned Qwen 2.5 model, specialized for question answering (Q&A) on internal audit data.
These models are optimized for efficient deployment in environments using llama.cpp, llama-cpp-python, or compatible inference servers (e.g., llama-server, text-generation-webui).
Fine-Tuning Overview
- Base Model: Qwen2.5 7B
 - Fine-Tuning Task: Instruction-based Q&A on internal audit reports, policies, and compliance logs
 - Training Data: ~100k entries from anonymized internal audit datasets (private & proprietary)
 - Format: Chat-style instruction tuning with questions and detailed answers
 
ποΈ Quantized Variants
| Filename | Quantization | Description | 
|---|---|---|
model-Q3_K_M.gguf | 
Q3_K_M | 3-bit quantization β low memory footprint | 
model-Q4_K_M.gguf | 
Q4_K_M | 4-bit β good performance and efficiency | 
model-Q5_K_M.gguf | 
Q5_K_M | 5-bit β balance between performance and quality | 
model-Q6_K.gguf | 
Q6_K | 6-bit β high quality, higher RAM usage | 
model-Q8_0.gguf | 
Q8_0 | 8-bit β near original model fidelity | 
model-fp16.gguf | 
FP16 | Full precision β highest quality, requires GPU | 
ChatML Format
Token structure
Each message in the conversation is wrapped like this:
<|im_start|>{role}
{message content}
<|im_end|>
{role}is usuallysystem,user, orassistant- This clearly defines message boundaries for the model to interpret dialogue turns
 
- Downloads last month
 - 56
 
							Hardware compatibility
						Log In
								
								to view the estimation