Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

57 posts Instrumental Convergence Deconfusion Orthogonality Thesis Gradient Hacking Gradient Descent

66 posts Corrigibility Treacherous Turn Mild Optimization Quantilization Satisficer Tripwire

58 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

36 Empowerment is (almost) All We Need

jacob_cannell

1mo

43

36 Applications for Deconfusing Goal-Directedness

adamShimi

1y

3

1 The Opportunity and Risks of Learning Human Values In-Context

Zachary Robertson

10d

4

12 Is the Orthogonality Thesis true for humans?

Noosphere89

1mo

18

72 Instrumental convergence is what makes general intelligence possible

tailcalled

1mo

11

36 A caveat to the Orthogonality Thesis

Wuschel Schulz

1mo

10

5 [ASoT] Instrumental convergence is useful

Ulisse Mini

1mo

9

17 Misalignment-by-default in multi-agent systems

Edouard Harris

2mo

8

153 Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout

3y

38

7 Why Do AI researchers Rate the Probability of Doom So Low?

Aorou

2mo

6

27 Instrumental convergence in single-agent systems

Edouard Harris

2mo

4

15 Some real examples of gradient hacking

Oliver Sourbut

1y

8

88 Coherence arguments imply a force for goal-directed behavior

KatjaGrace

1y

27

32 People care about each other even though they have imperfect motivational pointers?

TurnTrout

1mo

25

25 Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility "real"

joraine

26d

11

13 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

0 Contrary to List of Lethality's point 22, alignment's door number 2

False Name, Esq.

6d

1

111 Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth

4mo

8

7 What is wrong with this approach to corrigibility?

Rafael Cosman

5mo

8

109 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

24 Quantilizers and Generative Models

Adam Jermyn

5mo

5

118 Soares, Tallinn, and Yudkowsky discuss AGI cognition

So8res

1y

35

52 Corrigibility

paulfchristiano

4y

7

6 Simple question about corrigibility and values in AI.

jmh

1mo

1

61 Steam

abramdemski

6mo

9

21 Exploring Mild Behaviour in Embedded Agents

Megan Kinniment

5mo

3

21 CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]

berglund

2mo

1