Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

0 posts Empiricism

47 posts Interpretability (ML & AI)

123 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

26 Paper: Transformers learn in-context by gradient descent

LawrenceC

4d

11

211 The Plan - 2022 Update

johnswentworth

19d

33

57 Multi-Component Learning and S-Curves

Adam Jermyn

20d

24

47 "Cars and Elephants": a handwavy argument/analogy against mechanistic interpretability

David Scott Krueger (formerly: capybaralet)

1mo

25

22 Extracting and Evaluating Causal Direction in LLMs' Activations

Fabien Roger

6d

2

29 A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)

Neel Nanda

1mo

15

68 Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?

Neel Nanda

1mo

14

27 Toy Models and Tegum Products

Adam Jermyn

1mo

7

78 Polysemanticity and Capacity in Neural Networks

Buck

2mo

9

72 Engineering Monosemanticity in Toy Models

Adam Jermyn

1mo

6

24 Subsets and quotients in interpretability

Erik Jenner

18d

1

338 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

49 A Walkthrough of A Mathematical Framework for Transformer Circuits

Neel Nanda

1mo

5