Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
4148 posts
AI
AI Risk
GPT
AI Timelines
Machine Learning (ML)
Anthropics
AI Takeoff
Interpretability (ML & AI)
Existential Risk
Inner Alignment
Neuroscience
Goodhart's Law
14574 posts
Decision Theory
Utility Functions
Embedded Agency
Value Learning
Suffering
Counterfactuals
Nutrition
Animal Welfare
Newcomb's Problem
Research Agendas
VNM Theorem
Risks of Astronomical Suffering (S-risks)
1043
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
6mo
653
1039
Where I agree and disagree with Eliezer
paulfchristiano
6mo
205
808
Simulators
janus
3mo
103
531
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
3mo
83
521
chinchilla's wild implications
nostalgebraist
4mo
114
455
Counterarguments to the basic AI x-risk case
KatjaGrace
2mo
122
446
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
437
What failure looks like
paulfchristiano
3y
49
436
How To Get Into Independent Research On Alignment/Agency
johnswentworth
1y
33
432
Discussion with Eliezer Yudkowsky on AGI interventions
Rob Bensinger
1y
257
432
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
415
What DALL-E 2 can and cannot do
Swimmer963
7mo
305
404
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
394
Why I think strong general AI is coming soon
porby
2mo
126
352
EfficientZero: How It Works
1a3orn
1y
42
300
On how various plans miss the hard bits of the alignment challenge
So8res
5mo
81
286
Reward is not the optimization target
TurnTrout
4mo
97
271
Is AI Progress Impossible To Predict?
alyssavance
7mo
38
267
Embedded Agents
abramdemski
4y
41
247
Humans are very reliable agents
alyssavance
6mo
35
202
Embedded Agency (full-text version)
Scott Garrabrant
4y
15
190
Some conceptual alignment research projects
Richard_Ngo
3mo
14
176
Are wireheads happy?
Scott Alexander
12y
107
170
Can you control the past?
Joe Carlsmith
1y
93
161
Coherent decisions imply consistent utilities
Eliezer Yudkowsky
3y
81
158
Being a Robust Agent
Raemon
4y
32
155
why assume AGIs will optimize for fixed goals?
nostalgebraist
6mo
52
146
Newcomb's Problem and Regret of Rationality
Eliezer Yudkowsky
14y
614