Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
64 posts
AI Risk
Corrigibility
Instrumental Convergence
Treacherous Turn
Programming
2017-2019 AI Alignment Prize
LessWrong Event Transcripts
Satisficer
Petrov Day
19 posts
Goodhart's Law
Modeling People
724
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
6mo
653
336
Counterarguments to the basic AI x-risk case
KatjaGrace
2mo
122
205
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More
Ben Pace
3y
60
153
Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout
3y
38
148
AGI ruin scenarios are likely (and disjunctive)
So8res
4mo
37
144
Worlds Where Iterative Design Fails
johnswentworth
3mo
26
111
AI Safety "Success Stories"
Wei_Dai
3y
27
109
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
105
The Main Sources of AI Risk?
Daniel Kokotajlo
3y
25
103
AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak
28d
86
102
What can the principal-agent literature tell us about AI risk?
Alexis Carlier
2y
31
98
Niceness is unnatural
So8res
2mo
18
93
The alignment problem from a deep learning perspective
Richard_Ngo
4mo
13
93
Announcement: AI alignment prize round 3 winners and next round
cousin_it
4y
7
180
Goodhart Taxonomy
Scott Garrabrant
4y
33
70
Classifying specification problems as variants of Goodhart's Law
Vika
3y
5
68
How does Gradient Descent Interact with Goodhart?
Scott Garrabrant
3y
19
44
Defeating Goodhart and the "closest unblocked strategy" problem
Stuart_Armstrong
3y
15
44
Does Bayes Beat Goodhart?
abramdemski
3y
26
43
Specification gaming examples in AI
Vika
4y
9
43
Bounding Goodhart's Law
eric_langlois
4y
2
42
Using expected utility for Good(hart)
Stuart_Armstrong
4y
5
40
Introduction to Reducing Goodhart
Charlie Steiner
1y
10
30
Re-introducing Selection vs Control for Optimization (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 1)
Davidmanheim
3y
5
28
All I know is Goodhart
Stuart_Armstrong
3y
23
27
Competent Preferences
Charlie Steiner
1y
2
26
What does Optimization Mean, Again? (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 2)
Davidmanheim
3y
7
24
Goodhart's Curse and Limitations on AI Alignment
Gordon Seidoh Worley
3y
18