Posts

Introspective interpretability

Language Models, World Models, and Human Model-Building

Notes on Teaching GPT-3 Adding Numbers