Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
48 posts
Corrigibility
Treacherous Turn
Tripwire
18 posts
Mild Optimization
Quantilization
Satisficer
118
Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res
1y
35
111
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
109
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
105
A broad basin of attraction around human values?
Wei_Dai
8mo
16
73
A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
4y
9
64
Corrigibility Can Be VNM-Incoherent
TurnTrout
1y
24
60
Boeing 737 MAX MCAS as an agent corrigibility failure
shminux
3y
3
52
Corrigibility
paulfchristiano
4y
7
52
Introducing Corrigibility (an FAI research subfield)
So8res
8y
28
46
Cake, or death!
Stuart_Armstrong
10y
13
39
Solve Corrigibility Week
Logan Riggs
1y
21
36
A toy model of the treacherous turn
Stuart_Armstrong
6y
13
35
Can corrigibility be learned safely?
Wei_Dai
4y
115
34
Do what we mean vs. do what we say
Rohin Shah
4y
14
69
Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability
TurnTrout
1y
8
65
When to use quantilization
RyanCarey
3y
5
61
Steam
abramdemski
6mo
9
33
Satisficers want to become maximisers
Stuart_Armstrong
11y
68
31
In Praise of Maximizing – With Some Caveats
David Althaus
7y
19
25
Quantilizers maximize expected utility subject to a conservative cost constraint
jessicata
7y
0
24
Quantilizers and Generative Models
Adam Jermyn
5mo
5
21
Exploring Mild Behaviour in Embedded Agents
Megan Kinniment
5mo
3
20
Another view of quantilizers: avoiding Goodhart's Law
jessicata
6y
1
14
Quantilal control for finite MDPs
Vanessa Kosoy
4y
0
11
Optimization Regularization through Time Penalty
Linda Linsefors
3y
4
10
Defining a limited satisficer
Stuart_Armstrong
7y
11
10
Quantilizer ≡ Optimizer with a Bounded Amount of Output
itaibn0
1y
4
8
Creating a satisficer
Stuart_Armstrong
7y
26