Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.13759

about 3 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Discrete Diffusion LLM & MLLM

An collection of research/models in discrete diffusion large language and multimodal models

Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published Jun 16 • 42
GSAI-ML/LLaDA-8B-Instruct

Text Generation • 8B • Updated Feb 27 • 202k • 304
Dream-org/Dream-v0-Base-7B

Text Generation • 8B • Updated 16 days ago • 12.4k • 42
Dream-org/Dream-v0-Instruct-7B

Text Generation • 8B • Updated 16 days ago • 23.5k • 121

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Paper • 2505.22618 • Published May 28 • 42
DINGO: Constrained Inference for Diffusion LLMs

Paper • 2505.23061 • Published May 29 • 31
Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published Jun 16 • 42
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs

Paper • 2506.14429 • Published Jun 17 • 44

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 165
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29 • 23
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23 • 14

AI Math: Diffusion

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22, 2024 • 66
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Paper • 2408.12590 • Published Aug 22, 2024 • 37
Real-Time Video Generation with Pyramid Attention Broadcast

Paper • 2408.12588 • Published Aug 22, 2024 • 17
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 62

Reference Papers

Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published Jun 16 • 42

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 128
Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published Jun 16 • 42
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning

Paper • 2506.18841 • Published Jun 23 • 56
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Paper • 2506.19767 • Published Jun 24 • 13

Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Paper • 2505.15045 • Published May 21 • 55
Diffusion-Based Generative Models for 3D Occupancy Prediction in Autonomous Driving

Paper • 2505.23115 • Published May 29 • 2
Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published Jun 16 • 42

Long-Context Autoregressive Video Modeling with Next-Frame Prediction

Paper • 2503.19325 • Published Mar 25 • 73
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 98
Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published Jun 16 • 42

about 3 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Reference Papers

Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published Jun 16 • 42

Discrete Diffusion LLM & MLLM

An collection of research/models in discrete diffusion large language and multimodal models

Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published Jun 16 • 42
GSAI-ML/LLaDA-8B-Instruct

Text Generation • 8B • Updated Feb 27 • 202k • 304
Dream-org/Dream-v0-Base-7B

Text Generation • 8B • Updated 16 days ago • 12.4k • 42
Dream-org/Dream-v0-Instruct-7B

Text Generation • 8B • Updated 16 days ago • 23.5k • 121

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 128
Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published Jun 16 • 42
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning

Paper • 2506.18841 • Published Jun 23 • 56
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Paper • 2506.19767 • Published Jun 24 • 13

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Paper • 2505.22618 • Published May 28 • 42
DINGO: Constrained Inference for Diffusion LLMs

Paper • 2505.23061 • Published May 29 • 31
Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published Jun 16 • 42
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs

Paper • 2506.14429 • Published Jun 17 • 44

Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Paper • 2505.15045 • Published May 21 • 55
Diffusion-Based Generative Models for 3D Occupancy Prediction in Autonomous Driving

Paper • 2505.23115 • Published May 29 • 2
Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published Jun 16 • 42

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 165
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29 • 23
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23 • 14

Long-Context Autoregressive Video Modeling with Next-Frame Prediction

Paper • 2503.19325 • Published Mar 25 • 73
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 98
Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published Jun 16 • 42

AI Math: Diffusion

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22, 2024 • 66
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Paper • 2408.12590 • Published Aug 22, 2024 • 37
Real-Time Video Generation with Pyramid Attention Broadcast

Paper • 2408.12588 • Published Aug 22, 2024 • 17
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 62

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs