Qwen2.5 – Internal Audit Q&A (Quantized GGUF)

This repository contains quantized GGUF-format variants of a fine-tuned Qwen 2.5 model, specialized for question answering (Q&A) on internal audit data.

These models are optimized for efficient deployment in environments using llama.cpp, llama-cpp-python, or compatible inference servers (e.g., llama-server, text-generation-webui).

Fine-Tuning Overview

  • Base Model: Qwen2.5 7B
  • Fine-Tuning Task: Instruction-based Q&A on internal audit reports, policies, and compliance logs
  • Training Data: ~100k entries from anonymized internal audit datasets (private & proprietary)
  • Format: Chat-style instruction tuning with questions and detailed answers

πŸ—ƒοΈ Quantized Variants

Filename Quantization Description
model-Q3_K_M.gguf Q3_K_M 3-bit quantization β€” low memory footprint
model-Q4_K_M.gguf Q4_K_M 4-bit β€” good performance and efficiency
model-Q5_K_M.gguf Q5_K_M 5-bit β€” balance between performance and quality
model-Q6_K.gguf Q6_K 6-bit β€” high quality, higher RAM usage
model-Q8_0.gguf Q8_0 8-bit β€” near original model fidelity
model-fp16.gguf FP16 Full precision β€” highest quality, requires GPU

ChatML Format

Token structure

Each message in the conversation is wrapped like this:

<|im_start|>{role}
{message content}
<|im_end|>
  • {role} is usually system, user, or assistant
  • This clearly defines message boundaries for the model to interpret dialogue turns
Downloads last month
56
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for kturki/qwen2.5-7B_internal_audit

Base model

Qwen/Qwen2.5-7B
Quantized
(224)
this model