Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
4 posts
Lottery Ticket Hypothesis
82 posts
Interpretability (ML & AI)
135
Understanding “Deep Double Descent”
evhub
3y
51
50
Understanding the Lottery Ticket Hypothesis
Alex Flint
1y
9
14
Does the lottery ticket hypothesis suggest the scaling hypothesis?
Daniel Kokotajlo
2y
17
80
Gradations of Inner Alignment Obstacles
abramdemski
1y
22
123
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
26
Paper: Transformers learn in-context by gradient descent
LawrenceC
4d
11
211
The Plan - 2022 Update
johnswentworth
19d
33
13
The limited upside of interpretability
Peter S. Park
1mo
11
159
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
22d
27
57
Multi-Component Learning and S-Curves
Adam Jermyn
20d
24
60
By Default, GPTs Think In Plain Sight
Fabien Roger
1mo
16
47
"Cars and Elephants": a handwavy argument/analogy against mechanistic interpretability
David Scott Krueger (formerly: capybaralet)
1mo
25
99
Re-Examining LayerNorm
Eric Winsor
19d
8
22
Extracting and Evaluating Causal Direction in LLMs' Activations
Fabien Roger
6d
2
29
A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)
Neel Nanda
1mo
15
68
Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?
Neel Nanda
1mo
14
46
A Mystery About High Dimensional Concept Encoding
Fabien Roger
1mo
13
123
Interpreting Neural Networks through the Polytope Lens
Sid Black
2mo
26