Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
4 posts
Lottery Ticket Hypothesis
82 posts
Interpretability (ML & AI)
135
Understanding “Deep Double Descent”
evhub
3y
51
80
Gradations of Inner Alignment Obstacles
abramdemski
1y
22
50
Understanding the Lottery Ticket Hypothesis
Alex Flint
1y
9
14
Does the lottery ticket hypothesis suggest the scaling hypothesis?
Daniel Kokotajlo
2y
17
338
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
211
The Plan - 2022 Update
johnswentworth
19d
33
159
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
22d
27
139
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers
lifelonglearner
1y
16
136
MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"
Rob Bensinger
1y
13
136
A transparency and interpretability tech tree
evhub
6mo
10
123
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
123
Interpreting Neural Networks through the Polytope Lens
Sid Black
2mo
26
118
Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc
johnswentworth
6mo
52
118
The case for becoming a black-box investigator of language models
Buck
7mo
19
106
A Longlist of Theories of Impact for Interpretability
Neel Nanda
9mo
29
99
Re-Examining LayerNorm
Eric Winsor
19d
8
94
Circumventing interpretability: How to defeat mind-readers
Lee Sharkey
5mo
8
89
Search versus design
Alex Flint
2y
41