Improve model card: Add pipeline tag, library_name, paper, code, usage, and additional tags

by nielsr HF Staff - opened Jul 19

←

nielsr

Jul 19

This PR significantly enhances the model card for Senqiao/VisionThink-General by:

Adding pipeline_tag: image-text-to-text to enable better discoverability for multimodal tasks on the Hugging Face Hub.
Specifying library_name: transformers as the model is compatible with the Hugging Face Transformers library.
Including additional relevant tags such as vision-language-model, multimodal, and qwen.
Providing a detailed description of the model, summarizing its core contributions from the paper.
Including a direct link to the official paper: VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning.
Adding a direct link to the official GitHub repository for the code: https://github.com/dvlab-research/VisionThink.
Incorporating key highlights of the model's capabilities.
Adding installation instructions and a practical Python code snippet for quick inference using transformers.
Including the citation and acknowledgement sections.

These additions will make the model more discoverable, informative, and user-friendly for researchers and practitioners.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Cannot merge

This branch has merge conflicts in the following files:

· Sign up or log in to comment