Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
57 posts
Instrumental Convergence
Deconfusion
Orthogonality Thesis
Gradient Hacking
Gradient Descent
66 posts
Corrigibility
Treacherous Turn
Mild Optimization
Quantilization
Satisficer
Tripwire
59
You can still fetch the coffee today if you're dead tomorrow
davidad
11d
15
34
Empowerment is (almost) All We Need
jacob_cannell
1mo
43
27
Applications for Deconfusing Goal-Directedness
adamShimi
1y
3
0
The Opportunity and Risks of Learning Human Values In-Context
Zachary Robertson
10d
4
14
Is the Orthogonality Thesis true for humans?
Noosphere89
1mo
18
85
Instrumental convergence is what makes general intelligence possible
tailcalled
1mo
11
39
A caveat to the Orthogonality Thesis
Wuschel Schulz
1mo
10
7
[ASoT] Instrumental convergence is useful
Ulisse Mini
1mo
9
17
Misalignment-by-default in multi-agent systems
Edouard Harris
2mo
8
194
Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout
3y
38
12
Why Do AI researchers Rate the Probability of Doom So Low?
Aorou
2mo
6
32
Instrumental convergence in single-agent systems
Edouard Harris
2mo
4
14
Some real examples of gradient hacking
Oliver Sourbut
1y
8
78
Coherence arguments imply a force for goal-directed behavior
KatjaGrace
1y
27
26
People care about each other even though they have imperfect motivational pointers?
TurnTrout
1mo
25
26
Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility "real"
joraine
26d
11
9
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
4
Contrary to List of Lethality's point 22, alignment's door number 2
False Name, Esq.
6d
1
114
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
14
What is wrong with this approach to corrigibility?
Rafael Cosman
5mo
8
91
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
21
Quantilizers and Generative Models
Adam Jermyn
5mo
5
102
Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res
1y
35
52
Corrigibility
paulfchristiano
4y
7
5
Simple question about corrigibility and values in AI.
jmh
1mo
1
62
Steam
abramdemski
6mo
9
33
Exploring Mild Behaviour in Embedded Agents
Megan Kinniment
5mo
3
25
CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]
berglund
2mo
1