Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
64 posts
AI Risk
Corrigibility
Instrumental Convergence
Treacherous Turn
Programming
2017-2019 AI Alignment Prize
LessWrong Event Transcripts
Satisficer
Petrov Day
19 posts
Goodhart's Law
Modeling People
60
You can still fetch the coffee today if you're dead tomorrow
davidad
11d
15
147
Worlds Where Iterative Design Fails
johnswentworth
3mo
26
98
AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak
28d
86
243
Counterarguments to the basic AI x-risk case
KatjaGrace
2mo
122
17
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
79
Oversight Misses 100% of Thoughts The AI Does Not Think
johnswentworth
4mo
49
18
Misalignment-by-default in multi-agent systems
Edouard Harris
2mo
8
123
Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout
3y
38
83
What does it mean for an AGI to be 'safe'?
So8res
2mo
32
23
Instrumental convergence in single-agent systems
Edouard Harris
2mo
4
113
Niceness is unnatural
So8res
2mo
18
93
The alignment problem from a deep learning perspective
Richard_Ngo
4mo
13
135
AGI ruin scenarios are likely (and disjunctive)
So8res
4mo
37
24
Complex Systems for AI Safety [Pragmatic AI Safety #3]
Dan H
7mo
2
18
Reducing Goodhart: Announcement, Executive Summary
Charlie Steiner
4mo
0
38
Re-introducing Selection vs Control for Optimization (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 1)
Davidmanheim
3y
5
5
The Three Levels of Goodhart's Curse
Scott Garrabrant
4y
0
32
Competent Preferences
Charlie Steiner
1y
2
51
Specification gaming examples in AI
Vika
4y
9
33
What does Optimization Mean, Again? (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 2)
Davidmanheim
3y
7
156
Goodhart Taxonomy
Scott Garrabrant
4y
33
25
Goodhart's Curse and Limitations on AI Alignment
Gordon Seidoh Worley
3y
18
51
Defeating Goodhart and the "closest unblocked strategy" problem
Stuart_Armstrong
3y
15
49
Introduction to Reducing Goodhart
Charlie Steiner
1y
10
55
Does Bayes Beat Goodhart?
abramdemski
3y
26
29
Non-Adversarial Goodhart and AI Risks
Davidmanheim
4y
11
17
Models Modeling Models
Charlie Steiner
1y
5
51
Bounding Goodhart's Law
eric_langlois
4y
2