Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
57 posts
Instrumental Convergence
Deconfusion
Orthogonality Thesis
Gradient Hacking
Gradient Descent
66 posts
Corrigibility
Treacherous Turn
Mild Optimization
Quantilization
Satisficer
Tripwire
171
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More
Ben Pace
3y
60
132
Sorting Pebbles Into Correct Heaps
Eliezer Yudkowsky
14y
109
125
Goal retention discussion with Eliezer
MaxTegmark
8y
26
112
Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout
3y
38
98
Coherence arguments imply a force for goal-directed behavior
KatjaGrace
1y
27
97
Gradient hacking
evhub
3y
39
68
Gradient descent is not just more efficient genetic algorithms
leogao
1y
14
64
Distinguishing claims about training vs deployment
Richard_Ngo
1y
30
59
Instrumental convergence is what makes general intelligence possible
tailcalled
1mo
11
57
You can still fetch the coffee today if you're dead tomorrow
davidad
11d
15
50
Clarifying Power-Seeking and Instrumental Convergence
TurnTrout
3y
7
49
Review of 'Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More'
TurnTrout
1y
1
48
The Catastrophic Convergence Conjecture
TurnTrout
2y
15
45
Looking Deeper at Deconfusion
adamShimi
1y
13
134
Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res
1y
35
127
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
113
A broad basin of attraction around human values?
Wei_Dai
8mo
16
108
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
77
Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability
TurnTrout
1y
8
73
Corrigibility Can Be VNM-Incoherent
TurnTrout
1y
24
73
Introducing Corrigibility (an FAI research subfield)
So8res
8y
28
72
Boeing 737 MAX MCAS as an agent corrigibility failure
shminux
3y
3
69
When to use quantilization
RyanCarey
3y
5
66
A Gym Gridworld Environment for the Treacherous Turn
Michaƫl Trazzi
4y
9
63
Cake, or death!
Stuart_Armstrong
10y
13
60
Steam
abramdemski
6mo
9
52
Corrigibility
paulfchristiano
4y
7
41
Satisficers want to become maximisers
Stuart_Armstrong
11y
68