Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

4 posts Lottery Ticket Hypothesis

82 posts Interpretability (ML & AI)

164 Understanding “Deep Double Descent”

evhub

3y

51

44 Understanding the Lottery Ticket Hypothesis

Alex Flint

1y

9

10 Does the lottery ticket hypothesis suggest the scaling hypothesis?

Daniel Kokotajlo

2y

17

63 Gradations of Inner Alignment Obstacles

abramdemski

1y

22

140 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

21 Paper: Transformers learn in-context by gradient descent

LawrenceC

4d

11

250 The Plan - 2022 Update

johnswentworth

19d

33

26 The limited upside of interpretability

Peter S. Park

1mo

11

235 The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren

22d

27

56 Multi-Component Learning and S-Curves

Adam Jermyn

20d

24

78 By Default, GPTs Think In Plain Sight

Fabien Roger

1mo

16

45 "Cars and Elephants": a handwavy argument/analogy against mechanistic interpretability

David Scott Krueger (formerly: capybaralet)

1mo

25

151 Re-Examining LayerNorm

Eric Winsor

19d

8

35 Extracting and Evaluating Causal Direction in LLMs' Activations

Fabien Roger

6d

2

31 A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)

Neel Nanda

1mo

15

85 Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?

Neel Nanda

1mo

14

56 A Mystery About High Dimensional Concept Encoding

Fabien Roger

1mo

13

201 Interpreting Neural Networks through the Polytope Lens

Sid Black

2mo

26