8 13 46

Shuhuai Ren

ShuhuaiRen

https://renshuhuai-andy.github.io/

AI & ML interests

NLP, Multi-modal

Recent Activity

upvoted a paper 3 days ago

HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing

liked a model 5 months ago

XiaomiMiMo/MiMo-Audio-Tokenizer

upvoted a collection 5 months ago

MiMo-Audio

View all activity

Organizations

authored a paper 8 months ago

MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4, 2025 • 80

authored a paper 9 months ago

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Paper • 2505.07608 • Published May 12, 2025 • 82

authored 2 papers 11 months ago

Next Block Prediction: Video Generation via Semi-Autoregressive Modeling

Paper • 2502.07737 • Published Feb 11, 2025 • 9

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Paper • 2503.16430 • Published Mar 20, 2025 • 34

authored 3 papers about 1 year ago

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Paper • 2412.18619 • Published Dec 16, 2024 • 60

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

Paper • 2311.17404 • Published Nov 29, 2023 • 1

Parallelized Autoregressive Visual Generation

Paper • 2412.15119 • Published Dec 19, 2024 • 53

authored 9 papers over 1 year ago

Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

Paper • 2304.04704 • Published Apr 10, 2023

Delving into the Openness of CLIP

Paper • 2206.01986 • Published Jun 4, 2022

Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond

Paper • 2310.02071 • Published Oct 3, 2023 • 4

TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

Paper • 2310.19060 • Published Oct 29, 2023

DCA: Diversified Co-Attention towards Informative Live Video Commenting

Paper • 1911.02739 • Published Nov 7, 2019

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

Paper • 2402.15527 • Published Feb 21, 2024

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

Paper • 2404.10763 • Published Apr 16, 2024

TempCompass: Do Video LLMs Really Understand Videos?

Paper • 2403.00476 • Published Mar 1, 2024 • 1

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 26

authored a paper almost 2 years ago

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Paper • 2312.02051 • Published Dec 4, 2023 • 1

authored a paper over 2 years ago

M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning

Paper • 2306.04387 • Published Jun 7, 2023 • 8

Shuhuai Ren

AI & ML interests

Recent Activity

Organizations

ShuhuaiRen's activity