Qwen2.5 – Internal Audit Q&A (Quantized GGUF)

This repository contains quantized GGUF-format variants of a fine-tuned Qwen 2.5 model, specialized for question answering (Q&A) on internal audit data.

These models are optimized for efficient deployment in environments using llama.cpp, llama-cpp-python, or compatible inference servers (e.g., llama-server, text-generation-webui).

Fine-Tuning Overview

Base Model: Qwen2.5 7B
Fine-Tuning Task: Instruction-based Q&A on internal audit reports, policies, and compliance logs
Training Data: ~100k entries from anonymized internal audit datasets (private & proprietary)
Format: Chat-style instruction tuning with questions and detailed answers

🗃️ Quantized Variants

Filename	Quantization	Description
`model-Q3_K_M.gguf`	Q3_K_M	3-bit quantization — low memory footprint
`model-Q4_K_M.gguf`	Q4_K_M	4-bit — good performance and efficiency
`model-Q5_K_M.gguf`	Q5_K_M	5-bit — balance between performance and quality
`model-Q6_K.gguf`	Q6_K	6-bit — high quality, higher RAM usage
`model-Q8_0.gguf`	Q8_0	8-bit — near original model fidelity
`model-fp16.gguf`	FP16	Full precision — highest quality, requires GPU

ChatML Format

Token structure

Each message in the conversation is wrapped like this:

<|im_start|>{role}
{message content}
<|im_end|>

{role} is usually system, user, or assistant
This clearly defines message boundaries for the model to interpret dialogue turns

Downloads last month: 56

GGUF

Model size

8B params

Architecture

qwen2

Hardware compatibility

3-bit

4-bit

5-bit

6-bit

8-bit

View +1 variant

Model tree for kturki/qwen2.5-7B_internal_audit

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Quantized

(224)

this model