Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
48 posts
Corrigibility
Treacherous Turn
Tripwire
18 posts
Mild Optimization
Quantilization
Satisficer
25
Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility "real"
joraine
26d
11
32
People care about each other even though they have imperfect motivational pointers?
TurnTrout
1mo
25
111
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
13
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
109
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
105
A broad basin of attraction around human values?
Wei_Dai
8mo
16
21
CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]
berglund
2mo
1
118
Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res
1y
35
64
Corrigibility Can Be VNM-Incoherent
TurnTrout
1y
24
6
Simple question about corrigibility and values in AI.
jmh
1mo
1
26
[Intro to brain-like-AGI safety] 14. Controlled AGI
Steven Byrnes
7mo
25
39
Solve Corrigibility Week
Logan Riggs
1y
21
16
On corrigibility and its basin
Donald Hobson
6mo
3
15
Infernal Corrigibility, Fiendishly Difficult
David Udell
6mo
1
61
Steam
abramdemski
6mo
9
24
Quantilizers and Generative Models
Adam Jermyn
5mo
5
69
Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability
TurnTrout
1y
8
21
Exploring Mild Behaviour in Embedded Agents
Megan Kinniment
5mo
3
65
When to use quantilization
RyanCarey
3y
5
10
Quantilizer ≡ Optimizer with a Bounded Amount of Output
itaibn0
1y
4
31
In Praise of Maximizing – With Some Caveats
David Althaus
7y
19
25
Quantilizers maximize expected utility subject to a conservative cost constraint
jessicata
7y
0
14
Quantilal control for finite MDPs
Vanessa Kosoy
4y
0
11
Optimization Regularization through Time Penalty
Linda Linsefors
3y
4
20
Another view of quantilizers: avoiding Goodhart's Law
jessicata
6y
1
33
Satisficers want to become maximisers
Stuart_Armstrong
11y
68
5
Is 'satificing' optimisation?
Riccardo Volpato
2y
3
10
Defining a limited satisficer
Stuart_Armstrong
7y
11