Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 176 Bytes

README.md

File metadata and controls

5 lines (3 loc) · 176 Bytes

Grokking

Can we observe grokking on modular addition in a toy example?

This is inspired by: https://arxiv.org/abs/2301.05217 but running on a MLP instead of a transformer.