Abstract
Duo improves uniform-state discrete diffusion models by transferring techniques from Gaussian diffusion, enhancing training speed and enabling fast few-step text generation.
Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, Duo, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, doubling training speed by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This algorithm unlocks few-step generation in diffusion language models by accelerating sampling by two orders of magnitude. We provide the code and model checkpoints on the project page: http://s-sahoo.github.io/duo
Community
We enable few-step generation in diffusion language models by exploiting the underlying Gaussian diffusion.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Esoteric Language Models (2025)
- Anchored Diffusion Language Model (2025)
- Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes (2025)
- Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation (2025)
- ReDDiT: Rehashing Noise for Discrete Visual Generation (2025)
- Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion (2025)
- A Convergence Theory for Diffusion Language Models: An Information-Theoretic Perspective (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
related: https://arxiv.org/abs/2502.11564 - but I must say your solution is a lot more direct and emergent
Really cool paper! Still fully reading through it. I wonder if methods like this could be compatible with diffusion forcing (https://arxiv.org/abs/2407.01392) to generate very long samples in a stable way.
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper