Recent autoregressive models like MAR, FlowAR, xAR, and Harmon adopt diffusion sampling to improve the quality of image generation. However, this approach leads to slow inference speed, because it usually requires 50-100 diffusion steps per token. We introduce Diffusion Step Annealing (DiSA), a training-free method which gradually reduces the number of diffusion steps as more tokens are generated, achieving significant speedup while maintaining generation quality, as shown below.
As more tokens are generated during the autoregressive process, subsequent tokens follow more constrained distributions and are easier to sample.
DiSA achieves significant speedups while maintaining generation quality.
For MAR-H and Harmon-1.5B, we present the samples generated using DiSA.
For FlowAR and xAR, each image pair is generated with the same random seed, where the first is generated without DiSA while the other is with DiSA.
@article{zhao2025disa,
title={DiSA: Diffusion Step Annealing in Autoregressive Image Generation},
author={Zhao, Qinyu and Singh, Jaskirat and Xu, Ming and Asthana, Akshay and Gould, Stephen and Zheng, Liang},
year={2025},
journal={arXiv preprint arXiv:2505.20297},
}
We thank the REPA and REPA-E projects for the website template.