Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

57 posts Instrumental Convergence Deconfusion Orthogonality Thesis Gradient Hacking Gradient Descent

66 posts Corrigibility Treacherous Turn Mild Optimization Quantilization Satisficer Tripwire

59 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

34 Empowerment is (almost) All We Need

jacob_cannell

1mo

43

27 Applications for Deconfusing Goal-Directedness

adamShimi

1y

3

0 The Opportunity and Risks of Learning Human Values In-Context

Zachary Robertson

10d

4

14 Is the Orthogonality Thesis true for humans?

Noosphere89

1mo

18

85 Instrumental convergence is what makes general intelligence possible

tailcalled

1mo

11

39 A caveat to the Orthogonality Thesis

Wuschel Schulz

1mo

10

7 [ASoT] Instrumental convergence is useful

Ulisse Mini

1mo

9

17 Misalignment-by-default in multi-agent systems

Edouard Harris

2mo

8

194 Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout

3y

38

12 Why Do AI researchers Rate the Probability of Doom So Low?

Aorou

2mo

6

32 Instrumental convergence in single-agent systems

Edouard Harris

2mo

4

14 Some real examples of gradient hacking

Oliver Sourbut

1y

8

78 Coherence arguments imply a force for goal-directed behavior

KatjaGrace

1y

27

26 People care about each other even though they have imperfect motivational pointers?

TurnTrout

1mo

25

26 Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility "real"

joraine

26d

11

9 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

4 Contrary to List of Lethality's point 22, alignment's door number 2

False Name, Esq.

6d

1

114 Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth

4mo

8

14 What is wrong with this approach to corrigibility?

Rafael Cosman

5mo

8

91 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

21 Quantilizers and Generative Models

Adam Jermyn

5mo

5

102 Soares, Tallinn, and Yudkowsky discuss AGI cognition

So8res

1y

35

52 Corrigibility

paulfchristiano

4y

7

5 Simple question about corrigibility and values in AI.

jmh

1mo

1

62 Steam

abramdemski

6mo

9

33 Exploring Mild Behaviour in Embedded Agents

Megan Kinniment

5mo

3

25 CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]

berglund

2mo

1