Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
4148 posts
AI
AI Risk
GPT
AI Timelines
Machine Learning (ML)
Anthropics
AI Takeoff
Interpretability (ML & AI)
Existential Risk
Inner Alignment
Neuroscience
Goodhart's Law
14574 posts
Decision Theory
Utility Functions
Embedded Agency
Value Learning
Suffering
Counterfactuals
Nutrition
Animal Welfare
Newcomb's Problem
Research Agendas
VNM Theorem
Risks of Astronomical Suffering (S-risks)
777
Where I agree and disagree with Eliezer
paulfchristiano
6mo
205
724
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
6mo
653
472
Simulators
janus
3mo
103
364
chinchilla's wild implications
nostalgebraist
4mo
114
364
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
351
What DALL-E 2 can and cannot do
Swimmer963
7mo
305
344
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
3mo
83
338
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
336
Counterarguments to the basic AI x-risk case
KatjaGrace
2mo
122
325
Discussion with Eliezer Yudkowsky on AGI interventions
Rob Bensinger
1y
257
319
What failure looks like
paulfchristiano
3y
49
314
How To Get Into Independent Research On Alignment/Agency
johnswentworth
1y
33
310
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
303
What should you change in response to an "emergency"? And AI risk
AnnaSalamon
5mo
60
276
Is AI Progress Impossible To Predict?
alyssavance
7mo
38
273
EfficientZero: How It Works
1a3orn
1y
42
258
On how various plans miss the hard bits of the alignment challenge
So8res
5mo
81
252
Reward is not the optimization target
TurnTrout
4mo
97
248
Humans are very reliable agents
alyssavance
6mo
35
198
Embedded Agents
abramdemski
4y
41
168
Some conceptual alignment research projects
Richard_Ngo
3mo
14
167
Are wireheads happy?
Scott Alexander
12y
107
157
Impossibility results for unbounded utilities
paulfchristiano
10mo
104
147
Can you control the past?
Joe Carlsmith
1y
93
145
Introduction to Cartesian Frames
Scott Garrabrant
2y
29
143
Embedded Agency (full-text version)
Scott Garrabrant
4y
15
142
Decision theory does not imply that we get to have nice things
So8res
2mo
53
139
Coherent decisions imply consistent utilities
Eliezer Yudkowsky
3y
81