Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
4 posts
Lottery Ticket Hypothesis
82 posts
Interpretability (ML & AI)
97
Gradations of Inner Alignment Obstacles
abramdemski
1y
22
56
Understanding the Lottery Ticket Hypothesis
Alex Flint
1y
9
106
Understanding “Deep Double Descent”
evhub
3y
51
18
Does the lottery ticket hypothesis suggest the scaling hypothesis?
Daniel Kokotajlo
2y
17
106
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
70
Can we efficiently explain model behaviors?
paulfchristiano
4d
0
172
The Plan - 2022 Update
johnswentworth
19d
33
31
Paper: Transformers learn in-context by gradient descent
LawrenceC
4d
11
36
[ASoT] Natural abstractions and AlphaZero
Ulisse Mini
10d
1
83
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
22d
27
58
Multi-Component Learning and S-Curves
Adam Jermyn
20d
24
47
Re-Examining LayerNorm
Eric Winsor
19d
8
64
Engineering Monosemanticity in Toy Models
Adam Jermyn
1mo
6
9
Extracting and Evaluating Causal Direction in LLMs' Activations
Fabien Roger
6d
2
230
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
42
By Default, GPTs Think In Plain Sight
Fabien Roger
1mo
16
18
Subsets and quotients in interpretability
Erik Jenner
18d
1
51
Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?
Neel Nanda
1mo
14