Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
57 posts
Instrumental Convergence
Deconfusion
Orthogonality Thesis
Gradient Hacking
Gradient Descent
66 posts
Corrigibility
Treacherous Turn
Mild Optimization
Quantilization
Satisficer
Tripwire
6
(Extremely) Naive Gradient Hacking Doesn't Work
ojorgensen
9h
0
59
You can still fetch the coffee today if you're dead tomorrow
davidad
11d
15
85
Instrumental convergence is what makes general intelligence possible
tailcalled
1mo
11
6
Assessing the Capabilities of ChatGPT through Success Rates
Zachary Robertson
7d
0
39
A caveat to the Orthogonality Thesis
Wuschel Schulz
1mo
10
34
Empowerment is (almost) All We Need
jacob_cannell
1mo
43
32
Instrumental convergence in single-agent systems
Edouard Harris
2mo
4
20
Instrumental convergence: scale and physical interactions
Edouard Harris
2mo
0
14
Is the Orthogonality Thesis true for humans?
Noosphere89
1mo
18
17
Misalignment-by-default in multi-agent systems
Edouard Harris
2mo
8
7
[ASoT] Instrumental convergence is useful
Ulisse Mini
1mo
9
239
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More
Ben Pace
3y
60
12
Why Do AI researchers Rate the Probability of Doom So Low?
Aorou
2mo
6
51
Hypothesis: gradient descent prefers general circuits
Quintin Pope
10mo
26
26
Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility "real"
joraine
26d
11
4
Contrary to List of Lethality's point 22, alignment's door number 2
False Name, Esq.
6d
1
114
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
26
People care about each other even though they have imperfect motivational pointers?
TurnTrout
1mo
25
91
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
9
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
25
CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]
berglund
2mo
1
97
A broad basin of attraction around human values?
Wei_Dai
8mo
16
62
Steam
abramdemski
6mo
9
102
Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res
1y
35
33
Exploring Mild Behaviour in Embedded Agents
Megan Kinniment
5mo
3
21
Quantilizers and Generative Models
Adam Jermyn
5mo
5
61
Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability
TurnTrout
1y
8
55
Corrigibility Can Be VNM-Incoherent
TurnTrout
1y
24