DaoanZhang's picture

DaoanZhang

DwanZhang

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 10 hours ago

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

new activity 15 days ago

guyuchao/Mira:It seems that it need password to decrypt this dataset?

upvoted a paper 16 days ago

StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

View all activity

Organizations

upvoted a paper about 10 hours ago

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

Paper • 2512.22905 • Published 6 days ago • 15

upvoted a paper 16 days ago

StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

Paper • 2512.16915 • Published 16 days ago • 37

upvoted a paper 22 days ago

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

Paper • 2512.10756 • Published 23 days ago • 33

upvoted a paper 26 days ago

Unified Video Editing with Temporal Reasoner

Paper • 2512.07469 • Published 26 days ago • 45

upvoted 2 papers about 1 month ago

UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

Paper • 2511.18050 • Published Nov 22, 2025 • 37

VIDEOP2R: Video Understanding from Perception to Reasoning

Paper • 2511.11113 • Published Nov 14, 2025 • 112

upvoted a paper about 2 months ago

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Paper • 2511.08521 • Published Nov 11, 2025 • 37

upvoted a paper 5 months ago

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27, 2025 • 14

upvoted a paper 7 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 306

upvoted 4 papers 8 months ago

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20, 2025 • 133

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7, 2025 • 82

WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation

Paper • 2505.01490 • Published May 2, 2025 • 5

upvoted a paper 9 months ago

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

Paper • 2503.24379 • Published Mar 31, 2025 • 76

upvoted an article 9 months ago

Article

Open-source DeepResearch – Freeing our search agents

+3

Feb 4, 2025

•

1.31k

upvoted a paper about 1 year ago

SePPO: Semi-Policy Preference Optimization for Diffusion Alignment

Paper • 2410.05255 • Published Oct 7, 2024 • 5

upvoted 2 papers over 1 year ago

DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks

Paper • 2307.05628 • Published Jul 11, 2023 • 10

Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

Paper • 2402.00827 • Published Feb 1, 2024 • 2

upvoted a collection over 1 year ago

LLaVa-NeXT

LLaVa-NeXT (also known as LLaVa-1.6) improves upon the 1.5 series by incorporating higher image resolutions and more reasoning/OCR datasets. • 8 items • Updated Jul 19, 2024 • 32

upvoted a paper over 1 year ago

DreamReward: Text-to-3D Generation with Human Preference

Paper • 2403.14613 • Published Mar 21, 2024 • 37