Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
0 posts
Empiricism
47 posts
Interpretability (ML & AI)
123
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
26
Paper: Transformers learn in-context by gradient descent
LawrenceC
4d
11
211
The Plan - 2022 Update
johnswentworth
19d
33
57
Multi-Component Learning and S-Curves
Adam Jermyn
20d
24
47
"Cars and Elephants": a handwavy argument/analogy against mechanistic interpretability
David Scott Krueger (formerly: capybaralet)
1mo
25
22
Extracting and Evaluating Causal Direction in LLMs' Activations
Fabien Roger
6d
2
29
A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)
Neel Nanda
1mo
15
68
Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?
Neel Nanda
1mo
14
27
Toy Models and Tegum Products
Adam Jermyn
1mo
7
78
Polysemanticity and Capacity in Neural Networks
Buck
2mo
9
72
Engineering Monosemanticity in Toy Models
Adam Jermyn
1mo
6
24
Subsets and quotients in interpretability
Erik Jenner
18d
1
338
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
49
A Walkthrough of A Mathematical Framework for Transformer Circuits
Neel Nanda
1mo
5