Sourav Mishra's picture

11

Sourav Mishra

srvmishra832

·

AI & ML interests

LLMs and VLMs

Organizations

upvoted 11 papers 4 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 211

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

Paper • 2508.09736 • Published Aug 13, 2025 • 57

Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Paper • 2508.17437 • Published Aug 20, 2025 • 38

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21, 2025 • 64

PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

Paper • 2502.14282 • Published Feb 20, 2025 • 29

Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

Paper • 2506.04614 • Published Jun 5, 2025 • 19

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20, 2024 • 175

VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26, 2025 • 43

Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 267

VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26, 2025 • 139

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 160