Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

48 posts Corrigibility Treacherous Turn Tripwire

18 posts Mild Optimization Quantilization Satisficer

114 Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth

4mo

8

102 Soares, Tallinn, and Yudkowsky discuss AGI cognition

So8res

1y

35

97 A broad basin of attraction around human values?

Wei_Dai

8mo

16

91 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

80 A Gym Gridworld Environment for the Treacherous Turn

Michaël Trazzi

4y

9

55 Corrigibility Can Be VNM-Incoherent

TurnTrout

1y

24

52 Corrigibility

paulfchristiano

4y

7

48 Boeing 737 MAX MCAS as an agent corrigibility failure

shminux

3y

3

41 Solve Corrigibility Week

Logan Riggs

1y

21

33 A toy model of the treacherous turn

Stuart_Armstrong

6y

13

31 Do what we mean vs. do what we say

Rohin Shah

4y

14

31 Introducing Corrigibility (an FAI research subfield)

So8res

8y

28

29 Cake, or death!

Stuart_Armstrong

10y

13

29 Can corrigibility be learned safely?

Wei_Dai

4y

115

62 Steam

abramdemski

6mo

9

61 When to use quantilization

RyanCarey

3y

5

61 Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability

TurnTrout

1y

8

33 Exploring Mild Behaviour in Embedded Agents

Megan Kinniment

5mo

3

32 In Praise of Maximizing – With Some Caveats

David Althaus

7y

19

25 Satisficers want to become maximisers

Stuart_Armstrong

11y

68

21 Quantilizers and Generative Models

Adam Jermyn

5mo

5

19 Quantilizers maximize expected utility subject to a conservative cost constraint

jessicata

7y

0

15 Another view of quantilizers: avoiding Goodhart's Law

jessicata

6y

1

10 Quantilal control for finite MDPs

Vanessa Kosoy

4y

0

10 Optimization Regularization through Time Penalty

Linda Linsefors

3y

4

8 Quantilizer ≡ Optimizer with a Bounded Amount of Output

itaibn0

1y

4

7 Defining a limited satisficer

Stuart_Armstrong

7y

11

7 Is 'satificing' optimisation?

Riccardo Volpato

2y

3