Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

48 posts Corrigibility Treacherous Turn Tripwire

18 posts Mild Optimization Quantilization Satisficer

25 Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility "real"

joraine

26d

11

32 People care about each other even though they have imperfect motivational pointers?

TurnTrout

1mo

25

111 Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth

4mo

8

13 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

109 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

105 A broad basin of attraction around human values?

Wei_Dai

8mo

16

21 CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]

berglund

2mo

1

118 Soares, Tallinn, and Yudkowsky discuss AGI cognition

So8res

1y

35

64 Corrigibility Can Be VNM-Incoherent

TurnTrout

1y

24

6 Simple question about corrigibility and values in AI.

jmh

1mo

1

26 [Intro to brain-like-AGI safety] 14. Controlled AGI

Steven Byrnes

7mo

25

39 Solve Corrigibility Week

Logan Riggs

1y

21

16 On corrigibility and its basin

Donald Hobson

6mo

3

15 Infernal Corrigibility, Fiendishly Difficult

David Udell

6mo

1

61 Steam

abramdemski

6mo

9

24 Quantilizers and Generative Models

Adam Jermyn

5mo

5

69 Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability

TurnTrout

1y

8

21 Exploring Mild Behaviour in Embedded Agents

Megan Kinniment

5mo

3

65 When to use quantilization

RyanCarey

3y

5

10 Quantilizer ≡ Optimizer with a Bounded Amount of Output

itaibn0

1y

4

31 In Praise of Maximizing – With Some Caveats

David Althaus

7y

19

25 Quantilizers maximize expected utility subject to a conservative cost constraint

jessicata

7y

0

14 Quantilal control for finite MDPs

Vanessa Kosoy

4y

0

11 Optimization Regularization through Time Penalty

Linda Linsefors

3y

4

20 Another view of quantilizers: avoiding Goodhart's Law

jessicata

6y

1

33 Satisficers want to become maximisers

Stuart_Armstrong

11y

68

5 Is 'satificing' optimisation?

Riccardo Volpato

2y

3

10 Defining a limited satisficer

Stuart_Armstrong

7y

11