Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
48 posts
Corrigibility
Treacherous Turn
Tripwire
18 posts
Mild Optimization
Quantilization
Satisficer
134
Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res
1y
35
127
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
113
A broad basin of attraction around human values?
Wei_Dai
8mo
16
108
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
73
Corrigibility Can Be VNM-Incoherent
TurnTrout
1y
24
73
Introducing Corrigibility (an FAI research subfield)
So8res
8y
28
72
Boeing 737 MAX MCAS as an agent corrigibility failure
shminux
3y
3
66
A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
4y
9
63
Cake, or death!
Stuart_Armstrong
10y
13
52
Corrigibility
paulfchristiano
4y
7
41
Can corrigibility be learned safely?
Wei_Dai
4y
115
39
[Linkpost] Treacherous turns in the wild
Mark Xu
1y
6
39
A toy model of the treacherous turn
Stuart_Armstrong
6y
13
38
People care about each other even though they have imperfect motivational pointers?
TurnTrout
1mo
25
77
Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability
TurnTrout
1y
8
69
When to use quantilization
RyanCarey
3y
5
60
Steam
abramdemski
6mo
9
41
Satisficers want to become maximisers
Stuart_Armstrong
11y
68
31
Quantilizers maximize expected utility subject to a conservative cost constraint
jessicata
7y
0
30
In Praise of Maximizing – With Some Caveats
David Althaus
7y
19
27
Quantilizers and Generative Models
Adam Jermyn
5mo
5
25
Another view of quantilizers: avoiding Goodhart's Law
jessicata
6y
1
18
Quantilal control for finite MDPs
Vanessa Kosoy
4y
0
13
Defining a limited satisficer
Stuart_Armstrong
7y
11
12
Quantilizer ≡ Optimizer with a Bounded Amount of Output
itaibn0
1y
4
12
Optimization Regularization through Time Penalty
Linda Linsefors
3y
4
11
Creating a satisficer
Stuart_Armstrong
7y
26
9
Exploring Mild Behaviour in Embedded Agents
Megan Kinniment
5mo
3