Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
51 posts
Machine Learning (ML)
DeepMind
OpenAI
Truth, Semantics, & Meaning
Lottery Ticket Hypothesis
Honesty
Anthropic
Map and Territory
Calibration
52 posts
Interpretability (ML & AI)
AI Success Models
Conservatism (AI)
Principal-Agent Problems
Market making (AI safety technique)
Empiricism
410
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
307
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
253
Common misconceptions about OpenAI
Jacob_Hilton
3mo
138
173
A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]
Dan H
7mo
5
169
the scaling “inconsistency”: openAI’s new insight
nostalgebraist
2y
14
156
Understanding “Deep Double Descent”
evhub
3y
51
140
Clarifying AI X-risk
zac_kenton
1mo
23
104
Caution when interpreting Deepmind's In-context RL paper
Sam Marks
1mo
6
101
Paper: Teaching GPT3 to express uncertainty in words
Owain_Evans
6mo
7
97
Survey of NLP Researchers: NLP is contributing to AGI progress; major catastrophe plausible
Sam Bowman
3mo
6
94
Safety Implications of LeCun's path to machine intelligence
Ivan Vendrov
5mo
16
74
Paper: Discovering novel algorithms with AlphaTensor [Deepmind]
LawrenceC
2mo
18
71
Tabooing 'Agent' for Prosaic Alignment
Hjalmar_Wijk
3y
10
68
Truthful LMs as a warm-up for aligned AGI
Jacob_Hilton
11mo
14
422
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
239
The Plan - 2022 Update
johnswentworth
19d
33
227
Chris Olah’s views on AGI safety
evhub
3y
38
142
Re-Examining LayerNorm
Eric Winsor
19d
8
132
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers
lifelonglearner
1y
16
132
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
124
A Longlist of Theories of Impact for Interpretability
Neel Nanda
9mo
29
117
A transparency and interpretability tech tree
evhub
6mo
10
109
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
102
Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc
johnswentworth
6mo
52
101
More Recent Progress in the Theory of Neural Networks
jylin04
2mo
6
83
Polysemanticity and Capacity in Neural Networks
Buck
2mo
9
81
An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers
Neel Nanda
2mo
5
81
Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?
Neel Nanda
1mo
14