update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ Compared to [Tarsier-7B](https://huggingface.co/omni-research/Tarsier-7b), Tarsi
|
|
19 |
- Training Data:
|
20 |
- Pre-train: Over 40M samples of the mixture of video, image and text data, with 20.4M open-source and 19.8M in-house. Detailed as following:
|
21 |
<div align="center">
|
22 |
-
<img src="assets/tarsier2_training_dataset.png" width = "
|
23 |
</a>
|
24 |
<br>Figure 1: Summary of datasets used in the pre-training stage of Tarsier2.
|
25 |
</div>
|
@@ -37,7 +37,7 @@ Tarsier2-Recap-7b was trained in December 2024.
|
|
37 |
## Performace
|
38 |
Tarsier2-7B excels in various video understanding tasks, including video captioning, video question-answering, video grounding, hallucination test, etc.
|
39 |
<div align="center">
|
40 |
-
<img src="assets/performance_of_tarsier2.png" width = "
|
41 |
<br>Figure 2: Performance comparison of Tarsier2 with previous SOTA models at 7B-scale and GPT-4o.
|
42 |
</div>
|
43 |
|
|
|
19 |
- Training Data:
|
20 |
- Pre-train: Over 40M samples of the mixture of video, image and text data, with 20.4M open-source and 19.8M in-house. Detailed as following:
|
21 |
<div align="center">
|
22 |
+
<img src="assets/tarsier2_training_dataset.png" width = "75%">
|
23 |
</a>
|
24 |
<br>Figure 1: Summary of datasets used in the pre-training stage of Tarsier2.
|
25 |
</div>
|
|
|
37 |
## Performace
|
38 |
Tarsier2-7B excels in various video understanding tasks, including video captioning, video question-answering, video grounding, hallucination test, etc.
|
39 |
<div align="center">
|
40 |
+
<img src="assets/performance_of_tarsier2.png" width = "75%">
|
41 |
<br>Figure 2: Performance comparison of Tarsier2 with previous SOTA models at 7B-scale and GPT-4o.
|
42 |
</div>
|
43 |
|