Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
37 posts
Corrigibility
11 posts
Treacherous Turn
Tripwire
26
Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility "real"
joraine
26d
11
4
Contrary to List of Lethality's point 22, alignment's door number 2
False Name, Esq.
6d
1
114
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
26
People care about each other even though they have imperfect motivational pointers?
TurnTrout
1mo
25
91
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
9
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
25
CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]
berglund
2mo
1
97
A broad basin of attraction around human values?
Wei_Dai
8mo
16
55
Corrigibility Can Be VNM-Incoherent
TurnTrout
1y
24
25
[Intro to brain-like-AGI safety] 14. Controlled AGI
Steven Byrnes
7mo
25
21
Infernal Corrigibility, Fiendishly Difficult
David Udell
6mo
1
5
Simple question about corrigibility and values in AI.
jmh
1mo
1
41
Solve Corrigibility Week
Logan Riggs
1y
21
14
What is wrong with this approach to corrigibility?
Rafael Cosman
5mo
8
102
Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res
1y
35
20
[AN #165]: When large models are more likely to lie
Rohin Shah
1y
0
80
A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
4y
9
23
[Linkpost] Treacherous turns in the wild
Mark Xu
1y
6
16
Any work on honeypots (to detect treacherous turn attempts)?
David Scott Krueger (formerly: capybaralet)
2y
4
33
A toy model of the treacherous turn
Stuart_Armstrong
6y
13
12
Superintelligence 11: The treacherous turn
KatjaGrace
8y
50
9
Superintelligence 13: Capability control methods
KatjaGrace
8y
48
2
Corrigibility thoughts III: manipulating versus deceiving
Stuart_Armstrong
5y
0
2
Corrigibility thoughts II: the robot operator
Stuart_Armstrong
5y
2
1
Corrigibility thoughts I: caring about multiple things
Stuart_Armstrong
5y
0