Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

4 posts Lottery Ticket Hypothesis

82 posts Interpretability (ML & AI)

164 Understanding “Deep Double Descent”

evhub

3y

51

63 Gradations of Inner Alignment Obstacles

abramdemski

1y

22

44 Understanding the Lottery Ticket Hypothesis

Alex Flint

1y

9

10 Does the lottery ticket hypothesis suggest the scaling hypothesis?

Daniel Kokotajlo

2y

17

446 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

250 The Plan - 2022 Update

johnswentworth

19d

33

235 The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren

22d

27

201 Interpreting Neural Networks through the Polytope Lens

Sid Black

2mo

26

151 Re-Examining LayerNorm

Eric Winsor

19d

8

140 Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers

lifelonglearner

1y

16

140 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

134 Circumventing interpretability: How to defeat mind-readers

Lee Sharkey

5mo

8

131 A Longlist of Theories of Impact for Interpretability

Neel Nanda

9mo

29

128 The case for becoming a black-box investigator of language models

Buck

7mo

19

123 A transparency and interpretability tech tree

evhub

6mo

10

114 MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"

Rob Bensinger

1y

13

106 Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

johnswentworth

6mo

52

106 More Recent Progress in the Theory of Neural Networks

jylin04

2mo

6