Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

17 posts Corrigibility 2017-2019 AI Alignment Prize Petrov Day

3 posts Treacherous Turn Programming

13 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

109 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

39 Solve Corrigibility Week

Logan Riggs

1y

21

16 On corrigibility and its basin

Donald Hobson

6mo

3

23 Formalizing Policy-Modification Corrigibility

TurnTrout

1y

6

93 Announcement: AI alignment prize round 3 winners and next round

cousin_it

4y

7

74 Announcement: AI alignment prize round 4 winners

cousin_it

3y

41

52 Corrigibility

paulfchristiano

4y

7

34 Do what we mean vs. do what we say

Rohin Shah

4y

14

35 Can corrigibility be learned safely?

Wei_Dai

4y

115

23 Addressing three problems with counterfactual corrigibility: bad bets, defending against backstops, and overconfidence.

RyanCarey

4y

1

20 Petrov corrigibility

Stuart_Armstrong

4y

10

19 Corrigibility doesn't always have a good action to take

Stuart_Armstrong

4y

0

15 Corrigibility as Constrained Optimisation

Henrik Åslund

3y

3

23 [AN #165]: When large models are more likely to lie

Rohin Shah

1y

0

31 [Linkpost] Treacherous turns in the wild

Mark Xu

1y

6

73 A Gym Gridworld Environment for the Treacherous Turn

Michaël Trazzi

4y

9