Ordinary differential equations (ODEs) and related differential-equation formalisms are powerful mathematical tools for modeling a wide range of deep learning problems. In essence, an ODE describes how the state of a system changes as time varies. Here "time" can represent physical time, network depth, or the noise level used in generative modeling. The canonical form of an ODE is:
where
In the article Ordinary Differential Equations in Vision and Language, we explore how ODEs inspire and construct advanced AI models through three progressive levels: First, we interpret classic residual networks and their variants as direct discretizations of ODEs, examining how this perspective guides the design of architectures such as Transformers. Second, we introduce neural ODEs based on fully continuous-time modeling and their applications in flows. Finally, we discuss diffusion models as a connection between discrete and continuous frameworks. We outline how they model generation processes in vision and language from the perspective of differential equations.
This article seeks to maintain a balance between depth and readability. It provides a systematic explanation of the essential concepts and methods of ODEs while covering their typical applications in language and vision tasks. Overall, this work can serve as an introductory guide to ODEs and their applications in deep learning.
The full version of the article is available here: [pdf].
We also provide individual chapters to facilitate reading:
- Chapter 1: Mathematical Preliminaries [pdf]
- Chapter 2: The ODE Perspective on Transformers [pdf]
- Chapter 3: Neural ODEs and Flows [pdf]
- Chapter 4: Diffusion Models in Vision [pdf]
- Chapter 5: Diffusion Models in Language [pdf]
- Chapter 6: Conclusions and Future Directions [pdf]
To cite this article, please use the following BibTeX entry:
@article{xiao-etal:2026ode,
title={Ordinary Differential Equations in Vision and Language},
author={Tong Xiao, Junhao Ruan, Bei Li, Zhengtao Yu, Min Zhang and Jingbo Zhu},
journal={TechRxiv preprint TechRxiv.177160477.75893679},
year={2026}
}For any issues or comments, please feel free to contact the authors via e-mail: xiaotong [at] mail.neu.edu.cn or rangehow [at] outlook.com.
