Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
4 posts
Lottery Ticket Hypothesis
82 posts
Interpretability (ML & AI)
80
Gradations of Inner Alignment Obstacles
abramdemski
1y
22
135
Understanding “Deep Double Descent”
evhub
3y
51
50
Understanding the Lottery Ticket Hypothesis
Alex Flint
1y
9
14
Does the lottery ticket hypothesis suggest the scaling hypothesis?
Daniel Kokotajlo
2y
17
123
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
63
Can we efficiently explain model behaviors?
paulfchristiano
4d
0
211
The Plan - 2022 Update
johnswentworth
19d
33
26
Paper: Transformers learn in-context by gradient descent
LawrenceC
4d
11
159
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
22d
27
99
Re-Examining LayerNorm
Eric Winsor
19d
8
22
Extracting and Evaluating Causal Direction in LLMs' Activations
Fabien Roger
6d
2
31
[ASoT] Natural abstractions and AlphaZero
Ulisse Mini
10d
1
57
Multi-Component Learning and S-Curves
Adam Jermyn
20d
24
72
Engineering Monosemanticity in Toy Models
Adam Jermyn
1mo
6
338
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
60
By Default, GPTs Think In Plain Sight
Fabien Roger
1mo
16
24
Subsets and quotients in interpretability
Erik Jenner
18d
1
68
Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?
Neel Nanda
1mo
14