paper-reading - a kiyoxi2022 Collection

kiyoxi2022 's Collections

photo-concept-bucket

paper-reading

updated about 9 hours ago

Code as Agent Harness

Paper • 2605.18747 • Published 17 days ago • 212
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Paper • 2605.12500 • Published 23 days ago • 191
From Context to Skills: Can Language Models Learn from Context Skillfully?

Paper • 2604.27660 • Published May 3 • 166
PhysBrain 1.0 Technical Report

Paper • 2605.15298 • Published 21 days ago • 143
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

Paper • 2605.20025 • Published 16 days ago • 185
MMSkills: Towards Multimodal Skills for General Visual Agents

Paper • 2605.13527 • Published 21 days ago • 118
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

Paper • 2605.06130 • Published 28 days ago • 112
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Paper • 2605.18739 • Published 17 days ago • 112
Qwen-Image-2.0 Technical Report

Paper • 2605.10730 • Published 24 days ago • 110
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

Paper • 2605.18233 • Published 17 days ago • 92
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

Paper • 2605.00658 • Published May 1 • 84
Lance: Unified Multimodal Modeling by Multi-Task Synergy

Paper • 2605.18678 • Published 17 days ago • 78
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion

Paper • 2605.23902 • Published 13 days ago • 45
Qwen/Qwen-Image-Bench

Image-Text-to-Text • 27B • Updated 7 days ago • 6.69k • 48
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era

Paper • 2503.12329 • Published Mar 16, 2025 • 28
baidu/ERNIE-Image

Text-to-Image • Updated Apr 17 • 19k • • 640
GenClaw: Code-Driven Agentic Image Generation

Paper • 2605.30248 • Published 7 days ago • 36
ByteDance-Seed/Cola-DLM

Text Generation • Updated 20 days ago • 35
deepseek-ai/DeepSeek-V4-Pro

Text Generation • 862B • Updated 29 days ago • 5.81M • • 4.6k
Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE

Paper • 2605.02641 • Published May 4 • 1
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

Paper • 2605.28816 • Published 8 days ago • 419
MeiGen-AI/GenEvolve

Image-Text-to-Text • 9B • Updated 13 days ago • 159 • 6
CostaliyA/Flow-OPD

Text-to-Image • Updated 19 days ago • 86 • 1
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published Apr 14 • 109
ByteDance/Bernini-R

Image-Text-to-Video • Updated 1 day ago • 89 • 95
Trust Region On-Policy Distillation

Paper • 2606.01249 • Published 4 days ago • 33
nvidia/Cosmos3-Super

65B • Updated 3 days ago • 3.95k • 113