Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
64 posts
AI Risk
Corrigibility
Instrumental Convergence
Treacherous Turn
Programming
2017-2019 AI Alignment Prize
LessWrong Event Transcripts
Satisficer
Petrov Day
19 posts
Goodhart's Law
Modeling People
429
Counterarguments to the basic AI x-risk case
KatjaGrace
2mo
122
56
You can still fetch the coffee today if you're dead tomorrow
davidad
11d
15
108
AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak
28d
86
986
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
6mo
653
83
Niceness is unnatural
So8res
2mo
18
141
Worlds Where Iterative Design Fails
johnswentworth
3mo
26
161
AGI ruin scenarios are likely (and disjunctive)
So8res
4mo
37
61
What does it mean for an AGI to be 'safe'?
So8res
2mo
32
93
The alignment problem from a deep learning perspective
Richard_Ngo
4mo
13
91
Oversight Misses 100% of Thoughts The AI Does Not Think
johnswentworth
4mo
49
51
Eli's review of "Is power-seeking AI an existential risk?"
elifland
2mo
0
32
Empowerment is (almost) All We Need
jacob_cannell
1mo
43
31
Instrumental convergence in single-agent systems
Edouard Harris
2mo
4
87
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
24
Proxy misspecification and the capabilities vs. value learning race
Sam Marks
7mo
1
10
Reducing Goodhart: Announcement, Executive Summary
Charlie Steiner
4mo
0
204
Goodhart Taxonomy
Scott Garrabrant
4y
33
31
Introduction to Reducing Goodhart
Charlie Steiner
1y
10
23
Models Modeling Models
Charlie Steiner
1y
5
22
Competent Preferences
Charlie Steiner
1y
2
61
Classifying specification problems as variants of Goodhart's Law
Vika
3y
5
64
How does Gradient Descent Interact with Goodhart?
Scott Garrabrant
3y
19
37
Defeating Goodhart and the "closest unblocked strategy" problem
Stuart_Armstrong
3y
15
33
Does Bayes Beat Goodhart?
abramdemski
3y
26
35
Bounding Goodhart's Law
eric_langlois
4y
2
23
All I know is Goodhart
Stuart_Armstrong
3y
23
35
Specification gaming examples in AI
Vika
4y
9
31
Using expected utility for Good(hart)
Stuart_Armstrong
4y
5