Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

51 posts Machine Learning (ML) DeepMind OpenAI Truth, Semantics, & Meaning Lottery Ticket Hypothesis Honesty Anthropic Map and Territory Calibration

52 posts Interpretability (ML & AI) AI Success Models Conservatism (AI) Principal-Agent Problems Market making (AI safety technique) Empiricism

318 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

223 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

199 Common misconceptions about OpenAI

Jacob_Hilton

3mo

138

123 the scaling “inconsistency”: openAI’s new insight

nostalgebraist

2y

14

114 Understanding “Deep Double Descent”

evhub

3y

51

104 Caution when interpreting Deepmind's In-context RL paper

Sam Marks

1mo

6

100 Gradations of Inner Alignment Obstacles

abramdemski

1y

22

91 Paper: Teaching GPT3 to express uncertainty in words

Owain_Evans

6mo

7

86 Paper: Discovering novel algorithms with AlphaTensor [Deepmind]

LawrenceC

2mo

18

84 Safety Implications of LeCun's path to machine intelligence

Ivan Vendrov

5mo

16

81 Survey of NLP Researchers: NLP is contributing to AGI progress; major catastrophe plausible

Sam Bowman

3mo

6

77 A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]

Dan H

7mo

5

75 SGD's Bias

johnswentworth

1y

16

74 Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda

Logan Riggs

2y

12

254 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

183 The Plan - 2022 Update

johnswentworth

19d

33

167 Chris Olah’s views on AGI safety

evhub

3y

38

155 A transparency and interpretability tech tree

evhub

6mo

10

146 Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers

lifelonglearner

1y

16

134 Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

johnswentworth

6mo

52

114 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

113 Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth

4mo

8

102 Search versus design

Alex Flint

2y

41

88 A Longlist of Theories of Impact for Interpretability

Neel Nanda

9mo

29

85 A positive case for how we might succeed at prosaic AI alignment

evhub

1y

47

73 Polysemanticity and Capacity in Neural Networks

Buck

2mo

9

69 Engineering Monosemanticity in Toy Models

Adam Jermyn

1mo

6

65 Solving the whole AGI control problem, version 0.0001

Steven Byrnes

1y

7