Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
48 posts
Corrigibility
Treacherous Turn
Tripwire
18 posts
Mild Optimization
Quantilization
Satisficer
24
Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility "real"
joraine
26d
11
38
People care about each other even though they have imperfect motivational pointers?
TurnTrout
1mo
25
17
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
108
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
127
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
113
A broad basin of attraction around human values?
Wei_Dai
8mo
16
134
Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res
1y
35
17
CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]
berglund
2mo
1
73
Corrigibility Can Be VNM-Incoherent
TurnTrout
1y
24
7
Simple question about corrigibility and values in AI.
jmh
1mo
1
27
[Intro to brain-like-AGI safety] 14. Controlled AGI
Steven Byrnes
7mo
25
16
On corrigibility and its basin
Donald Hobson
6mo
3
37
Solve Corrigibility Week
Logan Riggs
1y
21
25
Formalizing Policy-Modification Corrigibility
TurnTrout
1y
6
60
Steam
abramdemski
6mo
9
27
Quantilizers and Generative Models
Adam Jermyn
5mo
5
77
Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability
TurnTrout
1y
8
9
Exploring Mild Behaviour in Embedded Agents
Megan Kinniment
5mo
3
69
When to use quantilization
RyanCarey
3y
5
12
Quantilizer ≡ Optimizer with a Bounded Amount of Output
itaibn0
1y
4
31
Quantilizers maximize expected utility subject to a conservative cost constraint
jessicata
7y
0
18
Quantilal control for finite MDPs
Vanessa Kosoy
4y
0
30
In Praise of Maximizing – With Some Caveats
David Althaus
7y
19
25
Another view of quantilizers: avoiding Goodhart's Law
jessicata
6y
1
41
Satisficers want to become maximisers
Stuart_Armstrong
11y
68
12
Optimization Regularization through Time Penalty
Linda Linsefors
3y
4
13
Defining a limited satisficer
Stuart_Armstrong
7y
11
3
Is 'satificing' optimisation?
Riccardo Volpato
2y
3