Semantic Diffusion Language Modeling

Published in Preprint, 2026 (Second author), 2026

Summary

A study of how the noising kernel in discrete diffusion language models affects training stability and generation quality. The work introduces a unified bias–variance trade-off framework for analyzing kernel choice.

Method — SemDLM

  • Semantic-neighborhood diffusion — corrupts tokens toward semantically similar tokens rather than uniformly, reducing training bias.
  • Shared refresh branch — couples noising with a refresh mechanism that lowers training variance.
  • Token-frequency prior correction — re-weights kernel probabilities by frequency to align training and sampling distributions.

These three choices reduce both bias and variance during training, while simultaneously strengthening the model’s error-correction capacity at sampling time.

Result

27.19 Test PPL on LM1B, outperforming several discrete-diffusion baselines.