DiSA: Diffusion Step Annealing in Autoregressive Image Generation

Qinyu Zhao1·Jaskirat Singh1·Ming Xu1·Akshay Asthana2·Stephen Gould1·Liang Zheng1

1 Australian National University   2 Seeing Machines  

Overview


Recent autoregressive models like MAR, FlowAR, xAR, and Harmon adopt diffusion sampling to improve the quality of image generation. However, this approach leads to slow inference speed, because it usually requires 50-100 diffusion steps per token. We introduce Diffusion Step Annealing (DiSA), a training-free method which gradually reduces the number of diffusion steps as more tokens are generated, achieving significant speedup while maintaining generation quality, as shown below.

Speedup of DiSA

Key Insight


As more tokens are generated during the autoregressive process, subsequent tokens follow more constrained distributions and are easier to sample.

Results


DiSA achieves significant speedups while maintaining generation quality.

Speed-Quality Trade-off

For MAR-H and Harmon-1.5B, we present the samples generated using DiSA.

Generation Examples

For FlowAR and xAR, each image pair is generated with the same random seed, where the first is generated without DiSA while the other is with DiSA.

Generation Examples

Citation


@article{zhao2025disa,
            title={DiSA: Diffusion Step Annealing in Autoregressive Image Generation},
            author={Zhao, Qinyu and Singh, Jaskirat and Xu, Ming and Asthana, Akshay and Gould, Stephen and Zheng, Liang},
            year={2025},
            journal={arXiv preprint arXiv:2505.20297},
          }

We thank the REPA and REPA-E projects for the website template.