Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

57 posts Instrumental Convergence Deconfusion Orthogonality Thesis Gradient Hacking Gradient Descent

66 posts Corrigibility Treacherous Turn Mild Optimization Quantilization Satisficer Tripwire

6 (Extremely) Naive Gradient Hacking Doesn't Work

ojorgensen

9h

0

59 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

85 Instrumental convergence is what makes general intelligence possible

tailcalled

1mo

11

6 Assessing the Capabilities of ChatGPT through Success Rates

Zachary Robertson

7d

0

39 A caveat to the Orthogonality Thesis

Wuschel Schulz

1mo

10

34 Empowerment is (almost) All We Need

jacob_cannell

1mo

43

32 Instrumental convergence in single-agent systems

Edouard Harris

2mo

4

20 Instrumental convergence: scale and physical interactions

Edouard Harris

2mo

0

14 Is the Orthogonality Thesis true for humans?

Noosphere89

1mo

18

17 Misalignment-by-default in multi-agent systems

Edouard Harris

2mo

8

7 [ASoT] Instrumental convergence is useful

Ulisse Mini

1mo

9

239 Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More

Ben Pace

3y

60

12 Why Do AI researchers Rate the Probability of Doom So Low?

Aorou

2mo

6

51 Hypothesis: gradient descent prefers general circuits

Quintin Pope

10mo

26

26 Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility "real"

joraine

26d

11

4 Contrary to List of Lethality's point 22, alignment's door number 2

False Name, Esq.

6d

1

114 Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth

4mo

8

26 People care about each other even though they have imperfect motivational pointers?

TurnTrout

1mo

25

91 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

9 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

25 CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]

berglund

2mo

1

97 A broad basin of attraction around human values?

Wei_Dai

8mo

16

62 Steam

abramdemski

6mo

9

102 Soares, Tallinn, and Yudkowsky discuss AGI cognition

So8res

1y

35

33 Exploring Mild Behaviour in Embedded Agents

Megan Kinniment

5mo

3

21 Quantilizers and Generative Models

Adam Jermyn

5mo

5

61 Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability

TurnTrout

1y

8

55 Corrigibility Can Be VNM-Incoherent

TurnTrout

1y

24