Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -23,7 +23,7 @@ REVERSE-Qwen2.5-VL-3B is a novel open-source vision-language model (VLM) that pe
|
|
| 23 |
REVERSE achieves **state-of-the-art hallucination reduction** across diverse captioning and open-ended visual question answering benchmarks. To ensure the apple-to-apple comparison, we fine-tune the released Qwen2.5-VL-3B model using both the LLaVA-FT setup and our REVERSE recipe, applying both on the same 100k subset. This allows us to directly compare the impact of our method against the LLaVA-FT baseline under consistent conditions as the Qwen2.5-VL's instruction tuning data is not publicly available.
|
| 24 |
|
| 25 |
| Benchmark | Metric | Qwen2.5-VL-FT | REVERSE (Ï„=0.01) |
|
| 26 |
-
| ------------ | ----------------------------- | ---------------- | ----------------- |
|
| 27 |
| CHAIR-MSCOCO | CHAIRi (↓) | 12.2 | **10.5** |
|
| 28 |
| | CHAIRs (↓) | 45.8 | **39.4** |
|
| 29 |
| AMBER-G | CHAIR (↓) | 7.7 | **7.5** |
|
|
|
|
| 23 |
REVERSE achieves **state-of-the-art hallucination reduction** across diverse captioning and open-ended visual question answering benchmarks. To ensure the apple-to-apple comparison, we fine-tune the released Qwen2.5-VL-3B model using both the LLaVA-FT setup and our REVERSE recipe, applying both on the same 100k subset. This allows us to directly compare the impact of our method against the LLaVA-FT baseline under consistent conditions as the Qwen2.5-VL's instruction tuning data is not publicly available.
|
| 24 |
|
| 25 |
| Benchmark | Metric | Qwen2.5-VL-FT | REVERSE (Ï„=0.01) |
|
| 26 |
+
| ------------ | ----------------------------- | ---------------- | ----------------- |
|
| 27 |
| CHAIR-MSCOCO | CHAIRi (↓) | 12.2 | **10.5** |
|
| 28 |
| | CHAIRs (↓) | 45.8 | **39.4** |
|
| 29 |
| AMBER-G | CHAIR (↓) | 7.7 | **7.5** |
|