Semantic Diffusion Language Modeling

Published in Preprint, 2026 (Second author), 2026

Summary

A study of how the noising kernel in discrete diffusion language models affects training stability and generation quality. The work introduces a unified bias–variance trade-off framework for analyzing kernel choice.

Method — SemDLM

Semantic-neighborhood diffusion — corrupts tokens toward semantically similar tokens rather than uniformly, reducing training bias.
Shared refresh branch — couples noising with a refresh mechanism that lowers training variance.
Token-frequency prior correction — re-weights kernel probabilities by frequency to align training and sampling distributions.

These three choices reduce both bias and variance during training, while simultaneously strengthening the model’s error-correction capacity at sampling time.

Result

27.19 Test PPL on LM1B, outperforming several discrete-diffusion baselines.

Share on

Twitter Facebook LinkedIn

Yuxiang Wang

Summary

Method — SemDLM

Result

Share on