Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

51 posts Machine Learning (ML) DeepMind OpenAI Truth, Semantics, & Meaning Lottery Ticket Hypothesis Honesty Anthropic Map and Territory Calibration

52 posts Interpretability (ML & AI) AI Success Models Conservatism (AI) Principal-Agent Problems Market making (AI safety technique) Empiricism

410 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

307 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

253 Common misconceptions about OpenAI

Jacob_Hilton

3mo

138

173 A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]

Dan H

7mo

5

169 the scaling “inconsistency”: openAI’s new insight

nostalgebraist

2y

14

156 Understanding “Deep Double Descent”

evhub

3y

51

140 Clarifying AI X-risk

zac_kenton

1mo

23

104 Caution when interpreting Deepmind's In-context RL paper

Sam Marks

1mo

6

101 Paper: Teaching GPT3 to express uncertainty in words

Owain_Evans

6mo

7

97 Survey of NLP Researchers: NLP is contributing to AGI progress; major catastrophe plausible

Sam Bowman

3mo

6

94 Safety Implications of LeCun's path to machine intelligence

Ivan Vendrov

5mo

16

74 Paper: Discovering novel algorithms with AlphaTensor [Deepmind]

LawrenceC

2mo

18

71 Tabooing 'Agent' for Prosaic Alignment

Hjalmar_Wijk

3y

10

68 Truthful LMs as a warm-up for aligned AGI

Jacob_Hilton

11mo

14

422 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

239 The Plan - 2022 Update

johnswentworth

19d

33

227 Chris Olah’s views on AGI safety

evhub

3y

38

142 Re-Examining LayerNorm

Eric Winsor

19d

8

132 Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers

lifelonglearner

1y

16

132 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

124 A Longlist of Theories of Impact for Interpretability

Neel Nanda

9mo

29

117 A transparency and interpretability tech tree

evhub

6mo

10

109 Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth

4mo

8

102 Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

johnswentworth

6mo

52

101 More Recent Progress in the Theory of Neural Networks

jylin04

2mo

6

83 Polysemanticity and Capacity in Neural Networks

Buck

2mo

9

81 An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers

Neel Nanda

2mo

5

81 Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?

Neel Nanda

1mo

14