Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

48 posts Corrigibility Treacherous Turn Tripwire

18 posts Mild Optimization Quantilization Satisficer

134 Soares, Tallinn, and Yudkowsky discuss AGI cognition

So8res

1y

35

127 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

113 A broad basin of attraction around human values?

Wei_Dai

8mo

16

108 Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth

4mo

8

73 Corrigibility Can Be VNM-Incoherent

TurnTrout

1y

24

73 Introducing Corrigibility (an FAI research subfield)

So8res

8y

28

72 Boeing 737 MAX MCAS as an agent corrigibility failure

shminux

3y

3

66 A Gym Gridworld Environment for the Treacherous Turn

Michaël Trazzi

4y

9

63 Cake, or death!

Stuart_Armstrong

10y

13

52 Corrigibility

paulfchristiano

4y

7

41 Can corrigibility be learned safely?

Wei_Dai

4y

115

39 [Linkpost] Treacherous turns in the wild

Mark Xu

1y

6

39 A toy model of the treacherous turn

Stuart_Armstrong

6y

13

38 People care about each other even though they have imperfect motivational pointers?

TurnTrout

1mo

25

77 Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability

TurnTrout

1y

8

69 When to use quantilization

RyanCarey

3y

5

60 Steam

abramdemski

6mo

9

41 Satisficers want to become maximisers

Stuart_Armstrong

11y

68

31 Quantilizers maximize expected utility subject to a conservative cost constraint

jessicata

7y

0

30 In Praise of Maximizing – With Some Caveats

David Althaus

7y

19

27 Quantilizers and Generative Models

Adam Jermyn

5mo

5

25 Another view of quantilizers: avoiding Goodhart's Law

jessicata

6y

1

18 Quantilal control for finite MDPs

Vanessa Kosoy

4y

0

13 Defining a limited satisficer

Stuart_Armstrong

7y

11

12 Quantilizer ≡ Optimizer with a Bounded Amount of Output

itaibn0

1y

4

12 Optimization Regularization through Time Penalty

Linda Linsefors

3y

4

11 Creating a satisficer

Stuart_Armstrong

7y

26

9 Exploring Mild Behaviour in Embedded Agents

Megan Kinniment

5mo

3