Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
0 posts
Empiricism
47 posts
Interpretability (ML & AI)
338
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
211
The Plan - 2022 Update
johnswentworth
19d
33
197
Chris Olah’s views on AGI safety
evhub
3y
38
139
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers
lifelonglearner
1y
16
136
A transparency and interpretability tech tree
evhub
6mo
10
123
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
118
Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc
johnswentworth
6mo
52
111
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
106
A Longlist of Theories of Impact for Interpretability
Neel Nanda
9mo
29
99
Re-Examining LayerNorm
Eric Winsor
19d
8
89
Search versus design
Alex Flint
2y
41
78
Polysemanticity and Capacity in Neural Networks
Buck
2mo
9
78
More Recent Progress in the Theory of Neural Networks
jylin04
2mo
6
72
Engineering Monosemanticity in Toy Models
Adam Jermyn
1mo
6