DiSA: Diffusion Step Annealing in Autoregressive Image Generation

Qinyu Zhao¹ · Jaskirat Singh¹ · Ming Xu¹ · Akshay Asthana² · Stephen Gould¹ · Liang Zheng¹

¹ Australian National University ² Seeing Machines

Overview

Recent autoregressive models like MAR, FlowAR, xAR, and Harmon adopt diffusion sampling to improve the quality of image generation. However, this approach leads to slow inference speed, because it usually requires 50-100 diffusion steps per token. We introduce Diffusion Step Annealing (DiSA), a training-free method which gradually reduces the number of diffusion steps as more tokens are generated, achieving significant speedup while maintaining generation quality, as shown below.

Key Insight

As more tokens are generated during the autoregressive process, subsequent tokens follow more constrained distributions and are easier to sample.

Evidence 1: Next tokens can be well predicted in later generation stages.

Evidence 2: Next tokens have lower variance at later autoregressive steps, as shown in (a) and (b).

Evidence 3: Diffusion paths at later stages are closer to straight lines, as shown in (c) above.

Results

DiSA achieves significant speedups while maintaining generation quality.

For MAR-H and Harmon-1.5B, we present the samples generated using DiSA.

For FlowAR and xAR, each image pair is generated with the same random seed, where the first is generated without DiSA while the other is with DiSA.

Citation

@article{zhao2025disa,
            title={DiSA: Diffusion Step Annealing in Autoregressive Image Generation},
            author={Zhao, Qinyu and Singh, Jaskirat and Xu, Ming and Asthana, Akshay and Gould, Stephen and Zheng, Liang},
            year={2025},
            journal={arXiv preprint arXiv:2505.20297},
          }

We thank the REPA and REPA-E projects for the website template.