Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

4 posts Lottery Ticket Hypothesis

82 posts Interpretability (ML & AI)

164 Understanding “Deep Double Descent”

evhub

3y

51

63 Gradations of Inner Alignment Obstacles

abramdemski

1y

22

44 Understanding the Lottery Ticket Hypothesis

Alex Flint

1y

9

10 Does the lottery ticket hypothesis suggest the scaling hypothesis?

Daniel Kokotajlo

2y

17

140 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

56 Can we efficiently explain model behaviors?

paulfchristiano

4d

0

250 The Plan - 2022 Update

johnswentworth

19d

33

235 The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren

22d

27

151 Re-Examining LayerNorm

Eric Winsor

19d

8

35 Extracting and Evaluating Causal Direction in LLMs' Activations

Fabien Roger

6d

2

21 Paper: Transformers learn in-context by gradient descent

LawrenceC

4d

11

26 [ASoT] Natural abstractions and AlphaZero

Ulisse Mini

10d

1

56 Multi-Component Learning and S-Curves

Adam Jermyn

20d

24

446 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

78 By Default, GPTs Think In Plain Sight

Fabien Roger

1mo

16

80 Engineering Monosemanticity in Toy Models

Adam Jermyn

1mo

6

201 Interpreting Neural Networks through the Polytope Lens

Sid Black

2mo

26

30 Subsets and quotients in interpretability

Erik Jenner

18d

1