Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
57 posts
Instrumental Convergence
Deconfusion
Orthogonality Thesis
Gradient Hacking
Gradient Descent
66 posts
Corrigibility
Treacherous Turn
Mild Optimization
Quantilization
Satisficer
Tripwire
57
You can still fetch the coffee today if you're dead tomorrow
davidad
11d
15
38
Empowerment is (almost) All We Need
jacob_cannell
1mo
43
45
Applications for Deconfusing Goal-Directedness
adamShimi
1y
3
2
The Opportunity and Risks of Learning Human Values In-Context
Zachary Robertson
10d
4
10
Is the Orthogonality Thesis true for humans?
Noosphere89
1mo
18
59
Instrumental convergence is what makes general intelligence possible
tailcalled
1mo
11
33
A caveat to the Orthogonality Thesis
Wuschel Schulz
1mo
10
3
[ASoT] Instrumental convergence is useful
Ulisse Mini
1mo
9
17
Misalignment-by-default in multi-agent systems
Edouard Harris
2mo
8
112
Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout
3y
38
2
Why Do AI researchers Rate the Probability of Doom So Low?
Aorou
2mo
6
22
Instrumental convergence in single-agent systems
Edouard Harris
2mo
4
16
Some real examples of gradient hacking
Oliver Sourbut
1y
8
98
Coherence arguments imply a force for goal-directed behavior
KatjaGrace
1y
27
38
People care about each other even though they have imperfect motivational pointers?
TurnTrout
1mo
25
24
Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility "real"
joraine
26d
11
17
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
-4
Contrary to List of Lethality's point 22, alignment's door number 2
False Name, Esq.
6d
1
108
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
0
What is wrong with this approach to corrigibility?
Rafael Cosman
5mo
8
127
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
27
Quantilizers and Generative Models
Adam Jermyn
5mo
5
134
Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res
1y
35
52
Corrigibility
paulfchristiano
4y
7
7
Simple question about corrigibility and values in AI.
jmh
1mo
1
60
Steam
abramdemski
6mo
9
9
Exploring Mild Behaviour in Embedded Agents
Megan Kinniment
5mo
3
17
CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]
berglund
2mo
1