Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

48 posts Corrigibility Treacherous Turn Tripwire

18 posts Mild Optimization Quantilization Satisficer

24 Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility "real"

joraine

26d

11

38 People care about each other even though they have imperfect motivational pointers?

TurnTrout

1mo

25

17 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

108 Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth

4mo

8

127 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

113 A broad basin of attraction around human values?

Wei_Dai

8mo

16

134 Soares, Tallinn, and Yudkowsky discuss AGI cognition

So8res

1y

35

17 CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]

berglund

2mo

1

73 Corrigibility Can Be VNM-Incoherent

TurnTrout

1y

24

7 Simple question about corrigibility and values in AI.

jmh

1mo

1

27 [Intro to brain-like-AGI safety] 14. Controlled AGI

Steven Byrnes

7mo

25

16 On corrigibility and its basin

Donald Hobson

6mo

3

37 Solve Corrigibility Week

Logan Riggs

1y

21

25 Formalizing Policy-Modification Corrigibility

TurnTrout

1y

6

60 Steam

abramdemski

6mo

9

27 Quantilizers and Generative Models

Adam Jermyn

5mo

5

77 Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability

TurnTrout

1y

8

9 Exploring Mild Behaviour in Embedded Agents

Megan Kinniment

5mo

3

69 When to use quantilization

RyanCarey

3y

5

12 Quantilizer ≡ Optimizer with a Bounded Amount of Output

itaibn0

1y

4

31 Quantilizers maximize expected utility subject to a conservative cost constraint

jessicata

7y

0

18 Quantilal control for finite MDPs

Vanessa Kosoy

4y

0

30 In Praise of Maximizing – With Some Caveats

David Althaus

7y

19

25 Another view of quantilizers: avoiding Goodhart's Law

jessicata

6y

1

41 Satisficers want to become maximisers

Stuart_Armstrong

11y

68

12 Optimization Regularization through Time Penalty

Linda Linsefors

3y

4

13 Defining a limited satisficer

Stuart_Armstrong

7y

11

3 Is 'satificing' optimisation?

Riccardo Volpato

2y

3