9 30 57

Jiaming Han

csuhan

https://csuhan.com

csuhan

AI & ML interests

Computer Vision

Recent Activity

upvoted a paper about 1 hour ago

OneThinker: All-in-one Reasoning Model for Image and Video

upvoted a paper 3 days ago

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

upvoted a paper 4 days ago

The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation

View all activity

Organizations

None yet

upvoted a paper about 1 hour ago

OneThinker: All-in-one Reasoning Model for Image and Video

Paper • 2512.03043 • Published 1 day ago • 15

upvoted a paper 3 days ago

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published 7 days ago • 127

upvoted a paper 4 days ago

The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation

Paper • 2511.20256 • Published 9 days ago • 26

upvoted a paper about 2 months ago

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning

Paper • 2510.08555 • Published Oct 9 • 63

upvoted a collection 2 months ago

Qwen3-Omni

Collection

6 items • Updated Oct 9 • 166

upvoted 2 papers 3 months ago

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

Paper • 2509.09680 • Published Sep 11 • 43

Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing

Paper • 2509.01984 • Published Sep 2 • 6

upvoted an article 4 months ago

Article

"Diffusers Image Fill" guide

Sep 13, 2024

•

upvoted 2 papers 4 months ago

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

Paper • 2508.05635 • Published Aug 7 • 73

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Paper • 2507.22827 • Published Jul 30 • 98

upvoted a collection 5 months ago

OmniCorpus

Collection

A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text • 6 items • Updated Sep 28 • 3

upvoted a paper 5 months ago

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Paper • 2506.18898 • Published Jun 23 • 33

upvoted 2 papers 6 months ago

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Paper • 2506.09350 • Published Jun 11 • 48

SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

Paper • 2506.05301 • Published Jun 5 • 56

upvoted 2 papers 8 months ago

Multimodal Long Video Modeling Based on Temporal Dynamic Context

Paper • 2504.10443 • Published Apr 14 • 3

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Paper • 2504.08685 • Published Apr 11 • 130

upvoted 3 papers 9 months ago

Long Context Tuning for Video Generation

Paper • 2503.10589 • Published Mar 13 • 14

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13 • 53

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

Paper • 2502.16707 • Published Feb 23 • 13

upvoted a paper 11 months ago

Diffusion Adversarial Post-Training for One-Step Video Generation

Paper • 2501.08316 • Published Jan 14 • 35

Jiaming Han

AI & ML interests

Recent Activity

Organizations

csuhan's activity

"Diffusers Image Fill" guide