Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
20 posts
Corrigibility
Treacherous Turn
Programming
2017-2019 AI Alignment Prize
Petrov Day
9 posts
Instrumental Convergence
Satisficer
LessWrong Event Transcripts
87
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
9
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
39
Solve Corrigibility Week
Logan Riggs
1y
21
15
On corrigibility and its basin
Donald Hobson
6mo
3
20
Formalizing Policy-Modification Corrigibility
TurnTrout
1y
6
85
Announcement: AI alignment prize round 3 winners and next round
cousin_it
4y
7
19
[AN #165]: When large models are more likely to lie
Rohin Shah
1y
0
76
A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
4y
9
22
[Linkpost] Treacherous turns in the wild
Mark Xu
1y
6
58
Announcement: AI alignment prize round 4 winners
cousin_it
3y
41
49
Corrigibility
paulfchristiano
4y
7
30
Do what we mean vs. do what we say
Rohin Shah
4y
14
29
Can corrigibility be learned safely?
Wei_Dai
4y
115
15
Corrigibility as Constrained Optimisation
Henrik Åslund
3y
3
56
You can still fetch the coffee today if you're dead tomorrow
davidad
11d
15
32
Empowerment is (almost) All We Need
jacob_cannell
1mo
43
227
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More
Ben Pace
3y
60
58
Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability
TurnTrout
1y
8
183
Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout
3y
38
60
Environmental Structure Can Cause Instrumental Convergence
TurnTrout
1y
44
33
Clarifying Power-Seeking and Instrumental Convergence
TurnTrout
3y
7
14
MDP models are determined by the agent architecture and the environmental dynamics
TurnTrout
1y
34
22
Power as Easily Exploitable Opportunities
TurnTrout
2y
5