Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
0 posts
Empiricism
47 posts
Interpretability (ML & AI)
132
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
239
The Plan - 2022 Update
johnswentworth
19d
33
142
Re-Examining LayerNorm
Eric Winsor
19d
8
33
Extracting and Evaluating Causal Direction in LLMs' Activations
Fabien Roger
6d
2
20
Paper: Transformers learn in-context by gradient descent
LawrenceC
4d
11
25
[ASoT] Natural abstractions and AlphaZero
Ulisse Mini
10d
1
53
Multi-Component Learning and S-Curves
Adam Jermyn
20d
24
422
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
75
Engineering Monosemanticity in Toy Models
Adam Jermyn
1mo
6
29
Subsets and quotients in interpretability
Erik Jenner
18d
1
81
Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?
Neel Nanda
1mo
14
79
A Barebones Guide to Mechanistic Interpretability Prerequisites
Neel Nanda
1mo
8
101
More Recent Progress in the Theory of Neural Networks
jylin04
2mo
6
81
An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers
Neel Nanda
2mo
5