Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
4148 posts
AI
AI Risk
GPT
AI Timelines
Machine Learning (ML)
Anthropics
AI Takeoff
Interpretability (ML & AI)
Existential Risk
Inner Alignment
Neuroscience
Goodhart's Law
14574 posts
Decision Theory
Utility Functions
Embedded Agency
Value Learning
Suffering
Counterfactuals
Nutrition
Animal Welfare
Newcomb's Problem
Research Agendas
VNM Theorem
Risks of Astronomical Suffering (S-risks)
515
Where I agree and disagree with Eliezer
paulfchristiano
6mo
205
405
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
6mo
653
296
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
287
What DALL-E 2 can and cannot do
Swimmer963
7mo
305
275
What should you change in response to an "emergency"? And AI risk
AnnaSalamon
5mo
60
242
Two-year update on my personal AI timelines
Ajeya Cotra
4mo
60
230
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
228
Visible Thoughts Project and Bounty Announcement
So8res
1y
104
218
Discussion with Eliezer Yudkowsky on AGI interventions
Rob Bensinger
1y
257
218
DeepMind: Generally capable agents emerge from open-ended play
Daniel Kokotajlo
1y
53
217
Contra Hofstadter on GPT-3 Nonsense
rictic
6mo
22
217
Counterarguments to the basic AI x-risk case
KatjaGrace
2mo
122
216
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
215
A Quick Guide to Confronting Doom
Ruby
8mo
36
281
Is AI Progress Impossible To Predict?
alyssavance
7mo
38
249
Humans are very reliable agents
alyssavance
6mo
35
218
Reward is not the optimization target
TurnTrout
4mo
97
216
On how various plans miss the hard bits of the alignment challenge
So8res
5mo
81
194
EfficientZero: How It Works
1a3orn
1y
42
177
Impossibility results for unbounded utilities
paulfchristiano
10mo
104
158
Are wireheads happy?
Scott Alexander
12y
107
147
Introduction to Cartesian Frames
Scott Garrabrant
2y
29
146
Some conceptual alignment research projects
Richard_Ngo
3mo
14
140
Saving Time
Scott Garrabrant
1y
19
139
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
gwern
1y
52
138
Decision theory does not imply that we get to have nice things
So8res
2mo
53
131
Humans are utility monsters
PhilGoetz
9y
217
129
Embedded Agents
abramdemski
4y
41