Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

51 posts Machine Learning (ML) DeepMind OpenAI Truth, Semantics, & Meaning Lottery Ticket Hypothesis Honesty Anthropic Map and Territory Calibration

52 posts Interpretability (ML & AI) AI Success Models Conservatism (AI) Principal-Agent Problems Market making (AI safety technique) Empiricism

364 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

265 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

226 Common misconceptions about OpenAI

Jacob_Hilton

3mo

138

146 the scaling “inconsistency”: openAI’s new insight

nostalgebraist

2y

14

135 Understanding “Deep Double Descent”

evhub

3y

51

125 A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]

Dan H

7mo

5

104 Caution when interpreting Deepmind's In-context RL paper

Sam Marks

1mo

6

102 Clarifying AI X-risk

zac_kenton

1mo

23

96 Paper: Teaching GPT3 to express uncertainty in words

Owain_Evans

6mo

7

89 Survey of NLP Researchers: NLP is contributing to AGI progress; major catastrophe plausible

Sam Bowman

3mo

6

89 Safety Implications of LeCun's path to machine intelligence

Ivan Vendrov

5mo

16

80 Paper: Discovering novel algorithms with AlphaTensor [Deepmind]

LawrenceC

2mo

18

80 Gradations of Inner Alignment Obstacles

abramdemski

1y

22

67 Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda

Logan Riggs

2y

12

338 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

211 The Plan - 2022 Update

johnswentworth

19d

33

197 Chris Olah’s views on AGI safety

evhub

3y

38

139 Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers

lifelonglearner

1y

16

136 A transparency and interpretability tech tree

evhub

6mo

10

123 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

118 Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

johnswentworth

6mo

52

111 Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth

4mo

8

106 A Longlist of Theories of Impact for Interpretability

Neel Nanda

9mo

29

99 Re-Examining LayerNorm

Eric Winsor

19d

8

89 Search versus design

Alex Flint

2y

41

78 A positive case for how we might succeed at prosaic AI alignment

evhub

1y

47

78 Polysemanticity and Capacity in Neural Networks

Buck

2mo

9

78 More Recent Progress in the Theory of Neural Networks

jylin04

2mo

6