tsunghanwu commited on
Commit
b3fb3b1
·
verified ·
1 Parent(s): 60f6e50

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -23,7 +23,7 @@ REVERSE-Qwen2.5-VL-3B is a novel open-source vision-language model (VLM) that pe
23
  REVERSE achieves **state-of-the-art hallucination reduction** across diverse captioning and open-ended visual question answering benchmarks. To ensure the apple-to-apple comparison, we fine-tune the released Qwen2.5-VL-3B model using both the LLaVA-FT setup and our REVERSE recipe, applying both on the same 100k subset. This allows us to directly compare the impact of our method against the LLaVA-FT baseline under consistent conditions as the Qwen2.5-VL's instruction tuning data is not publicly available.
24
 
25
  | Benchmark | Metric | Qwen2.5-VL-FT | REVERSE (Ï„=0.01) |
26
- | ------------ | ----------------------------- | ---------------- | ----------------- | ------------------ |
27
  | CHAIR-MSCOCO | CHAIRi (↓) | 12.2 | **10.5** |
28
  | | CHAIRs (↓) | 45.8 | **39.4** |
29
  | AMBER-G | CHAIR (↓) | 7.7 | **7.5** |
 
23
  REVERSE achieves **state-of-the-art hallucination reduction** across diverse captioning and open-ended visual question answering benchmarks. To ensure the apple-to-apple comparison, we fine-tune the released Qwen2.5-VL-3B model using both the LLaVA-FT setup and our REVERSE recipe, applying both on the same 100k subset. This allows us to directly compare the impact of our method against the LLaVA-FT baseline under consistent conditions as the Qwen2.5-VL's instruction tuning data is not publicly available.
24
 
25
  | Benchmark | Metric | Qwen2.5-VL-FT | REVERSE (Ï„=0.01) |
26
+ | ------------ | ----------------------------- | ---------------- | ----------------- |
27
  | CHAIR-MSCOCO | CHAIRi (↓) | 12.2 | **10.5** |
28
  | | CHAIRs (↓) | 45.8 | **39.4** |
29
  | AMBER-G | CHAIR (↓) | 7.7 | **7.5** |