Image-to-Text
Transformers
Safetensors
English
florence2
image-text-to-text
finetune
VQA
VLM
custom_code
Instructions to use prithivMLmods/Florence-2-VLM-Doc-VQA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use prithivMLmods/Florence-2-VLM-Doc-VQA with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="prithivMLmods/Florence-2-VLM-Doc-VQA", trust_remote_code=True)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("prithivMLmods/Florence-2-VLM-Doc-VQA", trust_remote_code=True) model = AutoModelForImageTextToText.from_pretrained("prithivMLmods/Florence-2-VLM-Doc-VQA", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
Model Details
Visual Question Answering Model
This model is a fine-tuned version of microsoft/Florence-2-base-ft designed for Visual Question Answering (VQA). It has been optimized for tasks where the model interprets images and responds to questions about the visual content.
Model Details
- Finetuned by: prithivMLmods
- Model type: Visual Question Answering (VQA)
- Language(s): English (NLP component)
- License: None specified
- Finetuned from model: microsoft/Florence-2-base-ft
Usage
This model can be used to perform VQA tasks, where it takes an image and a question about the image as input, and returns an answer based on the visual content.
- Downloads last month
- 13
Model tree for prithivMLmods/Florence-2-VLM-Doc-VQA
Base model
microsoft/Florence-2-base-ft