Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

0 posts Empiricism

47 posts Interpretability (ML & AI)

338 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

211 The Plan - 2022 Update

johnswentworth

19d

33

197 Chris Olah’s views on AGI safety

evhub

3y

38

139 Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers

lifelonglearner

1y

16

136 A transparency and interpretability tech tree

evhub

6mo

10

123 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

118 Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

johnswentworth

6mo

52

111 Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth

4mo

8

106 A Longlist of Theories of Impact for Interpretability

Neel Nanda

9mo

29

99 Re-Examining LayerNorm

Eric Winsor

19d

8

89 Search versus design

Alex Flint

2y

41

78 Polysemanticity and Capacity in Neural Networks

Buck

2mo

9

78 More Recent Progress in the Theory of Neural Networks

jylin04

2mo

6

72 Engineering Monosemanticity in Toy Models

Adam Jermyn

1mo

6