RAGognizer: Hallucination-Aware Fine-Tuning

RAGognizer logo

Model Summary

RAGognizer is a hallucination-aware fine-tuning approach that integrates a lightweight detection head into an LLM. It allows for the joint optimization of language modeling and hallucination detection.

This model is based on mistralai/Mistral-7B-Instruct-v0.3 and has been fine-tuned on the RAGognize dataset to provide token-level hallucination probabilities during generation.

Abstract

Retrieval-Augmented Generation (RAG) is widely used to augment the input to Large Language Models (LLMs) with external information, such as recent or domain-specific knowledge. Nonetheless, current models still produce closed-domain hallucinations and generate content that is unsupported by the retrieved context. Current detection approaches typically treat hallucination as a post-hoc problem, relying on black-box consistency checks or probes over frozen internal representations. In this work, we demonstrate that hallucination detection based on internal state representation can also serve as a direct training signal. We introduce RAGognize, a dataset of naturally occurring closed-domain hallucinations with token-level annotations, and RAGognizer, a hallucination-aware fine-tuning approach that integrates a lightweight detection head into an LLM, allowing for the joint optimization of language modeling and hallucination detection. This joint objective forces the model to improve the separability of its internal states regarding hallucinations while simultaneously learning to generate well-formed and meaningful responses. Across multiple benchmarks, RAGognizer achieves state-of-the-art token-level hallucination detection while substantially reducing hallucination rates during generation, without degrading language quality or relevance.

Usage

This model is designed to be used with the ragognizer library, which handles the loading of the attached detection heads and LoRA adapters.

Installation

WARNING: The library enforces old and unmaintained versions of dependencies to ensure reproducibility with the paper.

git clone https://github.com/F4biian/RAGognizer.git
cd RAGognizer/ragognizer
python3 -m venv .venv && . .venv/bin/activate && python -m pip install -U pip && pip install -e .

Inference Example

from ragognizer.detectors.RAGognizer import RAGognizer

# Initialize the detector
# Ensure you are running on CUDA if available
detector = RAGognizer(
    ragognizer_repo_name="F4biian/RAGognizer-Mistral-7B-Instruct-v0.3",
    device="cuda",
    use_postprocessor=False # Set to True only for the Qwen3-4B variant
)

# Define a chat / context
chat = [
    {"role": "user", "content": "Context: The wall is green. Based solely on the context: What color is the wall?"},
    {"role": "assistant", "content": "The color of the wall is gray."},
]

# Get token-level hallucination scores
scores = detector.predict(chat=chat, token_level=True)

# Output contains tokens, probabilities, and potential binary predictions
print(scores)

Model Details

  • Base Model: mistralai/Mistral-7B-Instruct-v0.3
  • Fine-Tuning Method: LoRA with transformer_heads integration.
  • Task: Causal Language Modeling + Token-Level Binary Classification (Hallucination Detection).
  • Precision: bfloat16 (recommended).

Intended Use

  1. RAG Verification: Detect hallucinations in RAG pipelines at the token level.
  2. Benchmarking: Evaluate internal state separability regarding closed-domain hallucinations.
  3. Generation: The model acts as a standard instruction-tuned LLM but with reduced closed-domain hallucination rates compared to the base model.

Citation

TODO
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for F4biian/RAGognizer-Mistral-7B-Instruct-v0.3

Finetuned
(412)
this model

Dataset used to train F4biian/RAGognizer-Mistral-7B-Instruct-v0.3

Space using F4biian/RAGognizer-Mistral-7B-Instruct-v0.3 1