Joy Caption Pre Alpha
Generate captions for images
Generate captions for images
Segment and caption objects in images and videos
Generate descriptions by uploading images or videos
Generate insights from charts using text prompts
Generate descriptions and answers about images
Upload an image to detect objects
Extract text and metadata from PDF files
Try PaliGemma on document understanding tasks
Generate text descriptions from images
Interact with a chatbot that understands text and images
Generate text by uploading images and asking questions
GPT 4o like bot.
Extract text from documents using images or PDFs
Generate detailed descriptions from images and videos
Generate document retrieval queries from an image
Microsoft Phi-3 Vision 128k with Multimodal capabilities
A Fully Open Multilingual Multimodal LLM for 39 Languages
Demo for DocLayout-YOLO
A data extraction tool to convert PDF to Markdown and JSON
Extract text from images
Huggingface space for JanusFlow-1.3B
Generate clickable coordinates on a screenshot
PaliGemma2 LoRA finetuned on VQAv2
Gaze detection using Moondream
Detect and estimate human poses in images and videos
nanonets ocr2 / olmocr / qwen2vl ocr / aya vision / rolmocr
Extract text from images and PDFs
OmniParser, turn your LLM into GUI agent
See, read, and reason—better together.
Generate text and segment images using PaliGemma 2
Interact with the Aya family of models.
interact with videos !
Classify images in real-time using your webcam
OCR for PDFs and Images using Mistral OCR
Upload an image to detect objects
Object Detection & Scene Understanding for Images and Video
Describe masked parts of images
Object Detection on Images and Video
Start camera to get descriptions based on instructions
Seed1.5-VL API Demo
Demo for Nanonets-OCR
Chat with images, videos, or PDFs to generate text
THUDM/GLM-4.1V-9B-Thinking Demo
Generate text responses from images and text input
Extract and visualize layout from PDFs or images