Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

17 posts Corrigibility 2017-2019 AI Alignment Prize Petrov Day

3 posts Treacherous Turn Programming

131 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

101 Announcement: AI alignment prize round 3 winners and next round

cousin_it

4y

7

90 Announcement: AI alignment prize round 4 winners

cousin_it

3y

41

55 Corrigibility

paulfchristiano

4y

7

41 Can corrigibility be learned safely?

Wei_Dai

4y

115

39 Solve Corrigibility Week

Logan Riggs

1y

21

38 Do what we mean vs. do what we say

Rohin Shah

4y

14

29 Addressing three problems with counterfactual corrigibility: bad bets, defending against backstops, and overconfidence.

RyanCarey

4y

1

28 Corrigibility doesn't always have a good action to take

Stuart_Armstrong

4y

0

27 Petrov corrigibility

Stuart_Armstrong

4y

10

26 Formalizing Policy-Modification Corrigibility

TurnTrout

1y

6

17 On corrigibility and its basin

Donald Hobson

6mo

3

17 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

15 A first look at the hard problem of corrigibility

jessicata

7y

0

70 A Gym Gridworld Environment for the Treacherous Turn

Michaël Trazzi

4y

9

40 [Linkpost] Treacherous turns in the wild

Mark Xu

1y

6

27 [AN #165]: When large models are more likely to lie

Rohin Shah

1y

0