Xkev
/

Llama-3.2V-11B-cot

@@ -1,14 +1,19 @@
 ---
-license: apache-2.0
-language:
-- en
 base_model:
 - meta-llama/Llama-3.2-11B-Vision-Instruct
 datasets:
 - Xkev/LLaVA-CoT-100k
-pipeline_tag: image-text-to-text
 library_name: transformers
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
@@ -24,6 +29,8 @@ The model was proposed in [LLaVA-CoT: Let Vision Language Models Reason Step-by-
 - **License:** apache-2.0
 - **Finetuned from model:** meta-llama/Llama-3.2-11B-Vision-Instruct
 ## Benchmark Results
 | MMStar | MMBench | MMVet | MathVista | AI2D | Hallusion | Average |
@@ -95,5 +102,5 @@ Using the same setting should accurately reproduce our results.
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
-The model may generate biased or offensive content, similar to other VLMs, due to limitations in the training data.
 Technically, the model's performance in aspects like instruction following still falls short of leading industry models.

 ---
 base_model:
 - meta-llama/Llama-3.2-11B-Vision-Instruct
 datasets:
 - Xkev/LLaVA-CoT-100k
+language:
+- en
 library_name: transformers
+license: apache-2.0
+pipeline_tag: image-text-to-text
+tags:
+- llava
+- reasoning
+- vqa
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
 - **License:** apache-2.0
 - **Finetuned from model:** meta-llama/Llama-3.2-11B-Vision-Instruct
+**Code:** [https://github.com/PKU-YuanGroup/LLaVA-CoT](https://github.com/PKU-YuanGroup/LLaVA-CoT)
 ## Benchmark Results
 | MMStar | MMBench | MMVet | MathVista | AI2D | Hallusion | Average |
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
+The model may generate biased or offensive content, similar to other VLMs, due to limitations in the training data.
 Technically, the model's performance in aspects like instruction following still falls short of leading industry models.