Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

0 posts Empiricism

47 posts Interpretability (ML & AI)

132 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

239 The Plan - 2022 Update

johnswentworth

19d

33

142 Re-Examining LayerNorm

Eric Winsor

19d

8

33 Extracting and Evaluating Causal Direction in LLMs' Activations

Fabien Roger

6d

2

20 Paper: Transformers learn in-context by gradient descent

LawrenceC

4d

11

25 [ASoT] Natural abstractions and AlphaZero

Ulisse Mini

10d

1

53 Multi-Component Learning and S-Curves

Adam Jermyn

20d

24

422 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

75 Engineering Monosemanticity in Toy Models

Adam Jermyn

1mo

6

29 Subsets and quotients in interpretability

Erik Jenner

18d

1

81 Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?

Neel Nanda

1mo

14

79 A Barebones Guide to Mechanistic Interpretability Prerequisites

Neel Nanda

1mo

8

101 More Recent Progress in the Theory of Neural Networks

jylin04

2mo

6

81 An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers

Neel Nanda

2mo

5