Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
48 posts
Corrigibility
Treacherous Turn
Tripwire
18 posts
Mild Optimization
Quantilization
Satisficer
26
Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility "real"
joraine
26d
11
4
Contrary to List of Lethality's point 22, alignment's door number 2
False Name, Esq.
6d
1
114
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
26
People care about each other even though they have imperfect motivational pointers?
TurnTrout
1mo
25
91
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
9
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
25
CHAI, Assistance Games, And Fully-Updated Deference [Scott Alexander]
berglund
2mo
1
97
A broad basin of attraction around human values?
Wei_Dai
8mo
16
102
Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res
1y
35
55
Corrigibility Can Be VNM-Incoherent
TurnTrout
1y
24
25
[Intro to brain-like-AGI safety] 14. Controlled AGI
Steven Byrnes
7mo
25
21
Infernal Corrigibility, Fiendishly Difficult
David Udell
6mo
1
5
Simple question about corrigibility and values in AI.
jmh
1mo
1
41
Solve Corrigibility Week
Logan Riggs
1y
21
62
Steam
abramdemski
6mo
9
33
Exploring Mild Behaviour in Embedded Agents
Megan Kinniment
5mo
3
21
Quantilizers and Generative Models
Adam Jermyn
5mo
5
61
Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability
TurnTrout
1y
8
61
When to use quantilization
RyanCarey
3y
5
8
Quantilizer ≡ Optimizer with a Bounded Amount of Output
itaibn0
1y
4
32
In Praise of Maximizing – With Some Caveats
David Althaus
7y
19
7
Is 'satificing' optimisation?
Riccardo Volpato
2y
3
10
Optimization Regularization through Time Penalty
Linda Linsefors
3y
4
19
Quantilizers maximize expected utility subject to a conservative cost constraint
jessicata
7y
0
10
Quantilal control for finite MDPs
Vanessa Kosoy
4y
0
15
Another view of quantilizers: avoiding Goodhart's Law
jessicata
6y
1
25
Satisficers want to become maximisers
Stuart_Armstrong
11y
68
7
Defining a limited satisficer
Stuart_Armstrong
7y
11