Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

17 posts Corrigibility 2017-2019 AI Alignment Prize Petrov Day

3 posts Treacherous Turn Programming

87 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

9 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

39 Solve Corrigibility Week

Logan Riggs

1y

21

15 On corrigibility and its basin

Donald Hobson

6mo

3

20 Formalizing Policy-Modification Corrigibility

TurnTrout

1y

6

85 Announcement: AI alignment prize round 3 winners and next round

cousin_it

4y

7

58 Announcement: AI alignment prize round 4 winners

cousin_it

3y

41

49 Corrigibility

paulfchristiano

4y

7

30 Do what we mean vs. do what we say

Rohin Shah

4y

14

29 Can corrigibility be learned safely?

Wei_Dai

4y

115

15 Corrigibility as Constrained Optimisation

Henrik Åslund

3y

3

17 Addressing three problems with counterfactual corrigibility: bad bets, defending against backstops, and overconfidence.

RyanCarey

4y

1

13 Petrov corrigibility

Stuart_Armstrong

4y

10

10 Corrigibility doesn't always have a good action to take

Stuart_Armstrong

4y

0

19 [AN #165]: When large models are more likely to lie

Rohin Shah

1y

0

76 A Gym Gridworld Environment for the Treacherous Turn

Michaël Trazzi

4y

9

22 [Linkpost] Treacherous turns in the wild

Mark Xu

1y

6