Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
51 posts
Machine Learning (ML)
DeepMind
OpenAI
Truth, Semantics, & Meaning
Lottery Ticket Hypothesis
Honesty
Anthropic
Map and Territory
Calibration
52 posts
Interpretability (ML & AI)
AI Success Models
Conservatism (AI)
Principal-Agent Problems
Market making (AI safety technique)
Empiricism
364
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
265
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
226
Common misconceptions about OpenAI
Jacob_Hilton
3mo
138
146
the scaling “inconsistency”: openAI’s new insight
nostalgebraist
2y
14
135
Understanding “Deep Double Descent”
evhub
3y
51
125
A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]
Dan H
7mo
5
104
Caution when interpreting Deepmind's In-context RL paper
Sam Marks
1mo
6
102
Clarifying AI X-risk
zac_kenton
1mo
23
96
Paper: Teaching GPT3 to express uncertainty in words
Owain_Evans
6mo
7
89
Survey of NLP Researchers: NLP is contributing to AGI progress; major catastrophe plausible
Sam Bowman
3mo
6
89
Safety Implications of LeCun's path to machine intelligence
Ivan Vendrov
5mo
16
80
Paper: Discovering novel algorithms with AlphaTensor [Deepmind]
LawrenceC
2mo
18
80
Gradations of Inner Alignment Obstacles
abramdemski
1y
22
67
Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda
Logan Riggs
2y
12
338
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
211
The Plan - 2022 Update
johnswentworth
19d
33
197
Chris Olah’s views on AGI safety
evhub
3y
38
139
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers
lifelonglearner
1y
16
136
A transparency and interpretability tech tree
evhub
6mo
10
123
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
118
Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc
johnswentworth
6mo
52
111
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
106
A Longlist of Theories of Impact for Interpretability
Neel Nanda
9mo
29
99
Re-Examining LayerNorm
Eric Winsor
19d
8
89
Search versus design
Alex Flint
2y
41
78
A positive case for how we might succeed at prosaic AI alignment
evhub
1y
47
78
Polysemanticity and Capacity in Neural Networks
Buck
2mo
9
78
More Recent Progress in the Theory of Neural Networks
jylin04
2mo
6