Tree of Tags

Go Back

You can't go any further

Choose this branch

meritocratic regular democratic

hot top alive

37 posts Corrigibility

11 posts Treacherous Turn Tripwire

32 People care about each other even though they have imperfect motivational pointers?

TurnTrout

1mo

25

25 Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility "real"

joraine

26d

11

13 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

0 Contrary to List of Lethality's point 22, alignment's door number 2

False Name, Esq.

6d

1

111 Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth

4mo

8

7 What is wrong with this approach to corrigibility?

Rafael Cosman

5mo

8

109 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

52 Corrigibility

paulfchristiano

4y

7

6 Simple question about corrigibility and values in AI.

jmh

1mo

1

21 CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]

berglund

2mo

1

16 On corrigibility and its basin

Donald Hobson

6mo

3

20 Petrov corrigibility

Stuart_Armstrong

4y

10

5 Corrigible omniscient AI capable of making clones

Kaj_Sotala

7y

0

7 An Idea For Corrigible, Recursively Improving Math Oracles

jimrandomh

7y

0

118 Soares, Tallinn, and Yudkowsky discuss AGI cognition

So8res

1y

35

14 Superintelligence 13: Capability control methods

KatjaGrace

8y

48

2 Corrigibility thoughts I: caring about multiple things

Stuart_Armstrong

5y

0

16 Superintelligence 11: The treacherous turn

KatjaGrace

8y

50

3 Corrigibility thoughts II: the robot operator

Stuart_Armstrong

5y

2

31 [Linkpost] Treacherous turns in the wild

Mark Xu

1y

6

23 [AN #165]: When large models are more likely to lie

Rohin Shah

1y

0

73 A Gym Gridworld Environment for the Treacherous Turn

Michaël Trazzi

4y

9

17 Any work on honeypots (to detect treacherous turn attempts)?

David Scott Krueger (formerly: capybaralet)

2y

4

36 A toy model of the treacherous turn

Stuart_Armstrong

6y

13

3 Corrigibility thoughts III: manipulating versus deceiving

Stuart_Armstrong

5y

0