Safetensors
qwen2_5_vl

Improve model card: Add pipeline tag, library_name, paper, code, usage, and additional tags

#1
by nielsr HF Staff - opened

This PR significantly enhances the model card for Senqiao/VisionThink-General by:

  • Adding pipeline_tag: image-text-to-text to enable better discoverability for multimodal tasks on the Hugging Face Hub.
  • Specifying library_name: transformers as the model is compatible with the Hugging Face Transformers library.
  • Including additional relevant tags such as vision-language-model, multimodal, and qwen.
  • Providing a detailed description of the model, summarizing its core contributions from the paper.
  • Including a direct link to the official paper: VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning.
  • Adding a direct link to the official GitHub repository for the code: https://github.com/dvlab-research/VisionThink.
  • Incorporating key highlights of the model's capabilities.
  • Adding installation instructions and a practical Python code snippet for quick inference using transformers.
  • Including the citation and acknowledgement sections.

These additions will make the model more discoverable, informative, and user-friendly for researchers and practitioners.

Cannot merge
This branch has merge conflicts in the following files:
  • README.md

Sign up or log in to comment