Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
48 posts
Corrigibility
Treacherous Turn
Tripwire
18 posts
Mild Optimization
Quantilization
Satisficer
114
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
102
Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res
1y
35
97
A broad basin of attraction around human values?
Wei_Dai
8mo
16
91
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
80
A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
4y
9
55
Corrigibility Can Be VNM-Incoherent
TurnTrout
1y
24
52
Corrigibility
paulfchristiano
4y
7
48
Boeing 737 MAX MCAS as an agent corrigibility failure
shminux
3y
3
41
Solve Corrigibility Week
Logan Riggs
1y
21
33
A toy model of the treacherous turn
Stuart_Armstrong
6y
13
31
Do what we mean vs. do what we say
Rohin Shah
4y
14
31
Introducing Corrigibility (an FAI research subfield)
So8res
8y
28
29
Cake, or death!
Stuart_Armstrong
10y
13
29
Can corrigibility be learned safely?
Wei_Dai
4y
115
62
Steam
abramdemski
6mo
9
61
When to use quantilization
RyanCarey
3y
5
61
Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability
TurnTrout
1y
8
33
Exploring Mild Behaviour in Embedded Agents
Megan Kinniment
5mo
3
32
In Praise of Maximizing – With Some Caveats
David Althaus
7y
19
25
Satisficers want to become maximisers
Stuart_Armstrong
11y
68
21
Quantilizers and Generative Models
Adam Jermyn
5mo
5
19
Quantilizers maximize expected utility subject to a conservative cost constraint
jessicata
7y
0
15
Another view of quantilizers: avoiding Goodhart's Law
jessicata
6y
1
10
Quantilal control for finite MDPs
Vanessa Kosoy
4y
0
10
Optimization Regularization through Time Penalty
Linda Linsefors
3y
4
8
Quantilizer ≡ Optimizer with a Bounded Amount of Output
itaibn0
1y
4
7
Defining a limited satisficer
Stuart_Armstrong
7y
11
7
Is 'satificing' optimisation?
Riccardo Volpato
2y
3