Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
17 posts
Corrigibility
2017-2019 AI Alignment Prize
Petrov Day
3 posts
Treacherous Turn
Programming
17
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
131
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
17
On corrigibility and its basin
Donald Hobson
6mo
3
39
Solve Corrigibility Week
Logan Riggs
1y
21
26
Formalizing Policy-Modification Corrigibility
TurnTrout
1y
6
90
Announcement: AI alignment prize round 4 winners
cousin_it
3y
41
101
Announcement: AI alignment prize round 3 winners and next round
cousin_it
4y
7
55
Corrigibility
paulfchristiano
4y
7
38
Do what we mean vs. do what we say
Rohin Shah
4y
14
41
Can corrigibility be learned safely?
Wei_Dai
4y
115
29
Addressing three problems with counterfactual corrigibility: bad bets, defending against backstops, and overconfidence.
RyanCarey
4y
1
28
Corrigibility doesn't always have a good action to take
Stuart_Armstrong
4y
0
27
Petrov corrigibility
Stuart_Armstrong
4y
10
15
Corrigibility as Constrained Optimisation
Henrik Åslund
3y
3
40
[Linkpost] Treacherous turns in the wild
Mark Xu
1y
6
27
[AN #165]: When large models are more likely to lie
Rohin Shah
1y
0
70
A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
4y
9