Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
37 posts
Corrigibility
11 posts
Treacherous Turn
Tripwire
32
People care about each other even though they have imperfect motivational pointers?
TurnTrout
1mo
25
25
Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility "real"
joraine
26d
11
13
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
0
Contrary to List of Lethality's point 22, alignment's door number 2
False Name, Esq.
6d
1
111
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
7
What is wrong with this approach to corrigibility?
Rafael Cosman
5mo
8
109
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
52
Corrigibility
paulfchristiano
4y
7
6
Simple question about corrigibility and values in AI.
jmh
1mo
1
21
CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]
berglund
2mo
1
16
On corrigibility and its basin
Donald Hobson
6mo
3
20
Petrov corrigibility
Stuart_Armstrong
4y
10
5
Corrigible omniscient AI capable of making clones
Kaj_Sotala
7y
0
7
An Idea For Corrigible, Recursively Improving Math Oracles
jimrandomh
7y
0
118
Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res
1y
35
14
Superintelligence 13: Capability control methods
KatjaGrace
8y
48
2
Corrigibility thoughts I: caring about multiple things
Stuart_Armstrong
5y
0
16
Superintelligence 11: The treacherous turn
KatjaGrace
8y
50
3
Corrigibility thoughts II: the robot operator
Stuart_Armstrong
5y
2
31
[Linkpost] Treacherous turns in the wild
Mark Xu
1y
6
23
[AN #165]: When large models are more likely to lie
Rohin Shah
1y
0
73
A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
4y
9
17
Any work on honeypots (to detect treacherous turn attempts)?
David Scott Krueger (formerly: capybaralet)
2y
4
36
A toy model of the treacherous turn
Stuart_Armstrong
6y
13
3
Corrigibility thoughts III: manipulating versus deceiving
Stuart_Armstrong
5y
0