Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
83 posts
AI Risk
Goodhart's Law
Corrigibility
Instrumental Convergence
Treacherous Turn
Programming
2017-2019 AI Alignment Prize
Satisficer
LessWrong Event Transcripts
Modeling People
Petrov Day
83 posts
World Optimization
Threat Models
Existential Risk
Coordination / Cooperation
Academic Papers
AI Safety Camp
Practical
Ethics & Morality
Symbol Grounding
Security Mindset
Sharp Left Turn
Fiction
986
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
6mo
653
429
Counterarguments to the basic AI x-risk case
KatjaGrace
2mo
122
227
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More
Ben Pace
3y
60
204
Goodhart Taxonomy
Scott Garrabrant
4y
33
183
Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout
3y
38
161
AGI ruin scenarios are likely (and disjunctive)
So8res
4mo
37
141
Worlds Where Iterative Design Fails
johnswentworth
3mo
26
108
AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak
28d
86
102
The Main Sources of AI Risk?
Daniel Kokotajlo
3y
25
98
AI Safety "Success Stories"
Wei_Dai
3y
27
97
What can the principal-agent literature tell us about AI risk?
Alexis Carlier
2y
31
93
The alignment problem from a deep learning perspective
Richard_Ngo
4mo
13
91
Oversight Misses 100% of Thoughts The AI Does Not Think
johnswentworth
4mo
49
87
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
517
It Looks Like You're Trying To Take Over The World
gwern
9mo
125
416
What failure looks like
paulfchristiano
3y
49
413
How To Get Into Independent Research On Alignment/Agency
johnswentworth
1y
33
292
A central AI alignment problem: capabilities generalization, and the sharp left turn
So8res
6mo
48
284
Six Dimensions of Operational Adequacy in AGI Projects
Eliezer Yudkowsky
6mo
65
252
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Andrew_Critch
1y
60
240
Another (outer) alignment failure story
paulfchristiano
1y
38
207
Some AI research areas and their relevance to existential safety
Andrew_Critch
2y
40
201
Reshaping the AI Industry
Thane Ruthenis
6mo
34
189
The next decades might be wild
Marius Hobbhahn
5d
21
177
Morality is Scary
Wei_Dai
1y
125
144
An Update on Academia vs. Industry (one year into my faculty job)
David Scott Krueger (formerly: capybaralet)
3mo
18
142
Clarifying “What failure looks like”
Sam Clarke
2y
14
136
AI coordination needs clear wins
evhub
3mo
15