omni-research
/

Tarsier2-7b-0115

Model card Files Files and versions

0nejiawei commited on Aug 11

Commit

343e2a0

·

1 Parent(s): 5c7766b

update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ Compared to [Tarsier-7B](https://huggingface.co/omni-research/Tarsier-7b), Tarsi
 - Training Data:
     - Pre-train: Over 40M samples of the mixture of video, image and text data, with 20.4M open-source and 19.8M in-house. Detailed as following:
     <div align="center">
-    <img src="assets/tarsier2_training_dataset.png" width = "100%">
     </a>
     <br>Figure 1: Summary of datasets used in the pre-training stage of Tarsier2.
     </div>
@@ -37,7 +37,7 @@ Tarsier2-Recap-7b was trained in December 2024.
 ## Performace
 Tarsier2-7B excels in various video understanding tasks, including video captioning, video question-answering, video grounding, hallucination test, etc.
 <div align="center">
-  <img src="assets/performance_of_tarsier2.png" width = "100%">
   <br>Figure 2: Performance comparison of Tarsier2 with previous SOTA models at 7B-scale and GPT-4o.
 </div>

 - Training Data:
     - Pre-train: Over 40M samples of the mixture of video, image and text data, with 20.4M open-source and 19.8M in-house. Detailed as following:
     <div align="center">
+    <img src="assets/tarsier2_training_dataset.png" width = "75%">
     </a>
     <br>Figure 1: Summary of datasets used in the pre-training stage of Tarsier2.
     </div>
 ## Performace
 Tarsier2-7B excels in various video understanding tasks, including video captioning, video question-answering, video grounding, hallucination test, etc.
 <div align="center">
+  <img src="assets/performance_of_tarsier2.png" width = "75%">
   <br>Figure 2: Performance comparison of Tarsier2 with previous SOTA models at 7B-scale and GPT-4o.
 </div>