REVERSE-v1.5-7B

arXiv

Model Summary

REVERSE-v1.5-7B is a novel open-source vision-language model (VLM) that performs both next-token predictioin and self-verification / self-correction during the generation process. Built on top of LLaVA-v1.5-7B, it is fine-tuned using the REVERSE Visual Instruct 1.3M dataset and equipped with a retrospective resampling mechanism that allows it to detect and correct hallucinations during generation. The model is trained in early March, 2025.

Performance

REVERSE achieves state-of-the-art hallucination reduction across a wide range of captioning and open-ended visual question answering benchmarks:

Benchmark Metric Best Baseline REVERSE (Ο„=0.003) REVERSE (Ο„=0.0003)
CHAIR-MSCOCO CHAIR (↓) HA-DPO (11.0) 10.3 6.1
CHAIRs (↓) EOS (38.2) 37.0 13.6
AMBER-G Hallucination (↓) EOS (5.1) 6.0 4.0
Coverage (↑) HALVA (53.0) 52.2 26.9
MMHal-Bench Score (↑) DoLA (2.33) 2.56 3.28
Hallucination Rate (↓) HACL (0.50) 0.47 0.30
HaloQuest Avg. Accuracy (↑) HALVA (23.9) 30.7 32.3
False Premise Acc. (↑) HALVA (21.1) 31.8 29.4
Visual Challenging Acc. (↑) DoLA (40.1) 31.5 18.7
Insufficient Context Acc. (↑) HALVA (10.7) 26.9 58.8

It also performs competitively on discriminative tasks compared with the base VLM.

Benchmark Metric LLaVA-v1.5-7B REVERSE (Ο„=0.5)
AMBER-D F1 Score (↑) 74.7 74.2
POPE F1 Score (↑) 85.9 85.9
MME-Hall Score (↑) 648.3 601.6

Usage

Please refer to the installation guide on GitHub to get started:
πŸ‘‰ Installation Guide

Additional Resources

Intended Use

Primary Use Cases:

  • Reducing hallucination in image captioning and VQA tasks
  • Benchmarking hallucination-aware generation
  • Research on grounded vision-language generation and self-correction

Target Users:
Researchers, developers, and students working in computer vision, NLP, and multimodal AI.

Downloads last month
172
Safetensors
Model size
6.76B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for tsunghanwu/reverse_llava_v15

Finetuned
(62)
this model

Dataset used to train tsunghanwu/reverse_llava_v15

Collection including tsunghanwu/reverse_llava_v15