Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

48 posts Corrigibility Treacherous Turn Tripwire

18 posts Mild Optimization Quantilization Satisficer

118 Soares, Tallinn, and Yudkowsky discuss AGI cognition

So8res

1y

35

111 Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth

4mo

8

109 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

105 A broad basin of attraction around human values?

Wei_Dai

8mo

16

73 A Gym Gridworld Environment for the Treacherous Turn

Michaël Trazzi

4y

9

64 Corrigibility Can Be VNM-Incoherent

TurnTrout

1y

24

60 Boeing 737 MAX MCAS as an agent corrigibility failure

shminux

3y

3

52 Corrigibility

paulfchristiano

4y

7

52 Introducing Corrigibility (an FAI research subfield)

So8res

8y

28

46 Cake, or death!

Stuart_Armstrong

10y

13

39 Solve Corrigibility Week

Logan Riggs

1y

21

36 A toy model of the treacherous turn

Stuart_Armstrong

6y

13

35 Can corrigibility be learned safely?

Wei_Dai

4y

115

34 Do what we mean vs. do what we say

Rohin Shah

4y

14

69 Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability

TurnTrout

1y

8

65 When to use quantilization

RyanCarey

3y

5

61 Steam

abramdemski

6mo

9

33 Satisficers want to become maximisers

Stuart_Armstrong

11y

68

31 In Praise of Maximizing – With Some Caveats

David Althaus

7y

19

25 Quantilizers maximize expected utility subject to a conservative cost constraint

jessicata

7y

0

24 Quantilizers and Generative Models

Adam Jermyn

5mo

5

21 Exploring Mild Behaviour in Embedded Agents

Megan Kinniment

5mo

3

20 Another view of quantilizers: avoiding Goodhart's Law

jessicata

6y

1

14 Quantilal control for finite MDPs

Vanessa Kosoy

4y

0

11 Optimization Regularization through Time Penalty

Linda Linsefors

3y

4

10 Defining a limited satisficer

Stuart_Armstrong

7y

11

10 Quantilizer ≡ Optimizer with a Bounded Amount of Output

itaibn0

1y

4

8 Creating a satisficer

Stuart_Armstrong

7y

26