Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

4 posts Lottery Ticket Hypothesis

82 posts Interpretability (ML & AI)

97 Gradations of Inner Alignment Obstacles

abramdemski

1y

22

56 Understanding the Lottery Ticket Hypothesis

Alex Flint

1y

9

106 Understanding “Deep Double Descent”

evhub

3y

51

18 Does the lottery ticket hypothesis suggest the scaling hypothesis?

Daniel Kokotajlo

2y

17

106 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

70 Can we efficiently explain model behaviors?

paulfchristiano

4d

0

172 The Plan - 2022 Update

johnswentworth

19d

33

31 Paper: Transformers learn in-context by gradient descent

LawrenceC

4d

11

36 [ASoT] Natural abstractions and AlphaZero

Ulisse Mini

10d

1

83 The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren

22d

27

58 Multi-Component Learning and S-Curves

Adam Jermyn

20d

24

47 Re-Examining LayerNorm

Eric Winsor

19d

8

64 Engineering Monosemanticity in Toy Models

Adam Jermyn

1mo

6

9 Extracting and Evaluating Causal Direction in LLMs' Activations

Fabien Roger

6d

2

230 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

42 By Default, GPTs Think In Plain Sight

Fabien Roger

1mo

16

18 Subsets and quotients in interpretability

Erik Jenner

18d

1

51 Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?

Neel Nanda

1mo

14