19 43 19

Xiangtai Li

LXT

https://lxtgh.github.io/

AI & ML interests

Computer Vision, Multi-Modal Understanding, Generative AI

Recent Activity

upvoted a paper 11 days ago

Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

upvoted a paper 12 days ago

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

upvoted a paper 19 days ago

Visual Spatial Tuning

View all activity

Organizations

authored 3 papers 5 months ago

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Paper • 2505.24164 • Published May 30

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

Paper • 2506.13691 • Published Jun 16 • 2

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Paper • 2507.07999 • Published Jul 10 • 49

authored 10 papers 6 months ago

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

Paper • 2505.23606 • Published May 29 • 14

Conditional Panoramic Image Generation via Masked Autoregressive Modeling

Paper • 2505.16862 • Published May 22

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query

Paper • 2506.03144 • Published Jun 3 • 7

BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation

Paper • 2505.12620 • Published May 19

CyberV: Cybernetics for Test-time Scaling in Video Understanding

Paper • 2506.07971 • Published Jun 9 • 5

DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers

Paper • 2505.21541 • Published May 24 • 7

authored 2 papers 7 months ago

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7 • 82

DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency

Paper • 2504.12080 • Published Apr 16 • 8

authored 5 papers 8 months ago

RelationBooth: Towards Relation-Aware Customized Object Generation

Paper • 2410.23280 • Published Oct 30, 2024 • 1

MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning

Paper • 2409.15179 • Published Sep 23, 2024

PredFormer: Transformers Are Effective Spatial-Temporal Predictive Learners

Paper • 2410.04733 • Published Oct 7, 2024

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

Paper • 2501.04670 • Published Jan 8

Point Cloud Mamba: Point Cloud Learning via State Space Model

Paper • 2403.00762 • Published Mar 1, 2024

Xiangtai Li

AI & ML interests

Recent Activity

Organizations

LXT's activity