Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

17 posts Corrigibility 2017-2019 AI Alignment Prize Petrov Day

3 posts Treacherous Turn Programming

17 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

131 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

17 On corrigibility and its basin

Donald Hobson

6mo

3

39 Solve Corrigibility Week

Logan Riggs

1y

21

26 Formalizing Policy-Modification Corrigibility

TurnTrout

1y

6

90 Announcement: AI alignment prize round 4 winners

cousin_it

3y

41

101 Announcement: AI alignment prize round 3 winners and next round

cousin_it

4y

7

55 Corrigibility

paulfchristiano

4y

7

38 Do what we mean vs. do what we say

Rohin Shah

4y

14

41 Can corrigibility be learned safely?

Wei_Dai

4y

115

29 Addressing three problems with counterfactual corrigibility: bad bets, defending against backstops, and overconfidence.

RyanCarey

4y

1

28 Corrigibility doesn't always have a good action to take

Stuart_Armstrong

4y

0

27 Petrov corrigibility

Stuart_Armstrong

4y

10

15 Corrigibility as Constrained Optimisation

Henrik Åslund

3y

3

40 [Linkpost] Treacherous turns in the wild

Mark Xu

1y

6

27 [AN #165]: When large models are more likely to lie

Rohin Shah

1y

0

70 A Gym Gridworld Environment for the Treacherous Turn

Michaël Trazzi

4y

9