OpenGVLab

community

https://github.com/opengvlab

opengvlab

OpenGVLab

Activity Feed Request to join this org

AI & ML interests

Computer Vision

Recent Activity

ownerEli updated a dataset 3 days ago

OpenGVLab/OpenCUA_Env

ownerEli published a dataset 3 days ago

OpenGVLab/OpenCUA_Env

duanyuchen updated a dataset 5 days ago

OpenGVLab/Doc-750K

View all activity

prithivMLmods

posted an update 3 days ago

Post

2084

Excited to introduce the new experimental model "Qwen2.5-VL-7B-Abliterated-Caption-it", which is performing exceptionally well on image captioning tasks. This variant is specifically tailored for Abliterated Captioning and Uncensored Image Captioning. It is designed to generate highly detailed and descriptive captions across a broad range of visual categories including images with complex, sensitive, or nuanced content while handling varying aspect ratios and resolutions.🧪🤗

✨ Try the demo here : prithivMLmods/Qwen2.5-VL
✨ Qwen2.5-VL-7B-Abliterated-Caption-it : prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it
✨ Multimodal VLMs : prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027
✨ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!

ownerEli

updated a dataset 3 days ago

OpenGVLab/OpenCUA_Env

Updated 3 days ago • 6

ownerEli

published a dataset 3 days ago

OpenGVLab/OpenCUA_Env

Updated 3 days ago • 6

prithivMLmods

posted an update 4 days ago

Post

2282

olmOCR [Allen AI] just got an upgrade! 📈🧑‍🍳

The allenai/olmOCR-7B-0725 — fine-tuned with allenai/olmOCR-mix-0225 on top of Qwen/Qwen2.5-VL-7B-Instruct, pushing the boundaries of OCR technology. It takes a single document image as input, with the longest side resized to 1288 pixels. High-quality, openly available approach to parsing pdfs and other complex documents optical character recognition.

Try the demo here: prithivMLmods/Multimodal-OCR

✨ Model: allenai/olmOCR-7B-0725
✨ Model [fp8]: allenai/olmOCR-7B-0725-FP8
✨ Multimodal Implementations Space Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!

duanyuchen

updated a dataset 5 days ago

OpenGVLab/Doc-750K

Preview • Updated 5 days ago • 3.57k • 9

wzk1015

updated a model 6 days ago

OpenGVLab/Mono-InternVL-2B

Image-Text-to-Text • 3B • Updated 6 days ago • 26.6k • 36

wzk1015

updated a collection 6 days ago

Mono-InternVL

Collection

A Pioneering Monolithic MLLM • 8 items • Updated 6 days ago • 6

douwh

in OpenGVLab/Mono-InternVL-2B-Synthetic-Data 6 days ago

Improve dataset card: Add paper, project, code links, update task category & add sample usage

#1 opened 6 days ago by

nielsr

douwh

in OpenGVLab/Mono-InternVL-2B-S1-3 6 days ago

Update model card with link to most recent paper and full citations

#2 opened 6 days ago by

nielsr

douwh

in OpenGVLab/Mono-InternVL-2B-S1-2 6 days ago

Update model card with Mono-InternVL-1.5 paper details and expanded information

#2 opened 6 days ago by

nielsr

douwh

in OpenGVLab/Mono-InternVL-2B-S1-1 6 days ago

Update paper link and enrich model card content

#2 opened 6 days ago by

nielsr

douwh

authored a paper 6 days ago

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models

Paper • 2507.12566 • Published 11 days ago • 14

Changyao

authored a paper 6 days ago

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models

Paper • 2507.12566 • Published 11 days ago • 14

wzk1015

authored a paper 6 days ago

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models

Paper • 2507.12566 • Published 11 days ago • 14

prithivMLmods

posted an update 7 days ago

Post

5022

Upgraded the step-by-step notebook for fine-tuning SigLIP2 on domain-specific image classification tasks. The notebook supports both datasets with predefined train/test splits and those with only a train split, making it suitable for low-resource, custom, and real-world classification scenarios. 📢👉

➺ FineTuning-SigLIP2-Notebook : prithivMLmods/FineTuning-SigLIP2-Notebook

➺ GitHub : https://github.com/PRITHIVSAKTHIUR/FineTuning-SigLIP-2

➺ In the first, datasets include predefined train and test splits, enabling conventional supervised learning and generalization evaluation : prithivMLmods/FineTuning-SigLIP2-Notebook (.ipynb)

➺ In the second scenario, only a training split is available; in such cases, the training set is either partially reserved for validation or reused entirely for evaluation : prithivMLmods/FineTuning-SigLIP2-Notebook (.ipynb)

This flexibility supports experimentation in constrained or domain-specific settings, where standard test annotations may not exist.