Skip to content

Questions about whether it is an autoregressive model #2

@batch-norm

Description

@batch-norm

"In the LLaDA paper, it is clearly stated that the model is a diffusion model rather than an autoregressive model. However, I found that your code uses a lower triangular matrix mask, which introduces causal inference relationships and turns the model into an autoregressive one. Does this conflict with the core argument of the paper? Additionally, when I tried to remove this lower triangular matrix from the source code, the loss decreased very slowly, and the test accuracy after 5 epochs was 0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions