Guo Chen's picture

Guo Chen

cg1177

·

cg1177

AI & ML interests

None yet

Recent Activity

liked a model about 1 month ago

lulidong/AV-Reasoner-7B

upvoted a paper about 1 month ago

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

authored a paper about 2 months ago

Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision

View all activity

Organizations

upvoted a paper about 1 month ago

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 258

upvoted 2 papers about 2 months ago

Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision

Paper • 2506.06253 • Published Jun 6 • 9

AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs

Paper • 2506.05328 • Published Jun 5 • 20

upvoted a paper 3 months ago

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Paper • 2504.15271 • Published Apr 21 • 66

upvoted a paper 7 months ago

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

Paper • 2412.12075 • Published Dec 16, 2024 • 1

upvoted a paper 8 months ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 160

upvoted 2 papers 11 months ago

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Paper • 2408.16532 • Published Aug 29, 2024 • 51

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28, 2024 • 88

upvoted 2 papers about 1 year ago

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 55

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2, 2024 • 123

upvoted 6 papers over 1 year ago

Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25, 2024 • 55

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25, 2024 • 80

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Paper • 2404.04125 • Published Apr 4, 2024 • 30

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Paper • 2403.15377 • Published Mar 22, 2024 • 27

Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

Paper • 2403.13248 • Published Mar 20, 2024 • 78

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14, 2024 • 16