Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
17 posts
Corrigibility
2017-2019 AI Alignment Prize
Petrov Day
3 posts
Treacherous Turn
Programming
131
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
101
Announcement: AI alignment prize round 3 winners and next round
cousin_it
4y
7
90
Announcement: AI alignment prize round 4 winners
cousin_it
3y
41
55
Corrigibility
paulfchristiano
4y
7
41
Can corrigibility be learned safely?
Wei_Dai
4y
115
39
Solve Corrigibility Week
Logan Riggs
1y
21
38
Do what we mean vs. do what we say
Rohin Shah
4y
14
29
Addressing three problems with counterfactual corrigibility: bad bets, defending against backstops, and overconfidence.
RyanCarey
4y
1
28
Corrigibility doesn't always have a good action to take
Stuart_Armstrong
4y
0
27
Petrov corrigibility
Stuart_Armstrong
4y
10
26
Formalizing Policy-Modification Corrigibility
TurnTrout
1y
6
17
On corrigibility and its basin
Donald Hobson
6mo
3
17
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
15
A first look at the hard problem of corrigibility
jessicata
7y
0
70
A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
4y
9
40
[Linkpost] Treacherous turns in the wild
Mark Xu
1y
6
27
[AN #165]: When large models are more likely to lie
Rohin Shah
1y
0