YannQi's picture

6 11 9

YannQi

YannQi

·

https://yannqi.github.io/

yannqi

AI & ML interests

Computer vision, AGI, Multi-modality.

Recent Activity

authored a paper 10 days ago

Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering

authored a paper 10 days ago

Taming Modality Entanglement in Continual Audio-Visual Segmentation

authored a paper 10 days ago

HunyuanOCR Technical Report

View all activity

Organizations

authored 3 papers 10 days ago

Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering

Paper • 2510.14605 • Published Oct 16 • 4

Taming Modality Entanglement in Continual Audio-Visual Segmentation

Paper • 2510.17234 • Published Oct 20 • 4

HunyuanOCR Technical Report

Paper • 2511.19575 • Published 13 days ago • 19

authored a paper 3 months ago

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Paper • 2508.21113 • Published Aug 28 • 110

authored 3 papers 6 months ago

Continuous Speculative Decoding for Autoregressive Image Generation

Paper • 2411.11925 • Published Nov 18, 2024 • 16

Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

Paper • 2505.15431 • Published May 21 • 1

Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger

Paper • 2506.07785 • Published Jun 9 • 1

authored 3 papers about 1 year ago

Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis

Paper • 2409.06135 • Published Sep 10, 2024 • 16

AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation

Paper • 2408.01708 • Published Aug 3, 2024 • 4

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Paper • 2312.06462 • Published Dec 11, 2023