Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
17 posts
Myopia
9 posts
Deceptive Alignment
Deception
75
The Credit Assignment Problem
abramdemski
3y
40
57
Open Problems with Myopia
Mark Xu
1y
16
53
Partial Agency
abramdemski
3y
18
53
AI safety via market making
evhub
2y
45
50
LCDT, A Myopic Decision Theory
adamShimi
1y
51
45
Why GPT wants to mesa-optimize & how we might change this
John_Maxwell
2y
32
45
Acceptability Verification: A Research Agenda
David Udell
5mo
0
45
Arguments against myopic training
Richard_Ngo
2y
39
44
Towards a mechanistic understanding of corrigibility
evhub
3y
26
33
Steering Behaviour: Testing for (Non-)Myopia in Language Models
Evan R. Murphy
15d
16
31
Random Thoughts on Predict-O-Matic
abramdemski
3y
3
31
Understanding and controlling auto-induced distributional shift
LRudL
1y
3
27
Defining Myopia
abramdemski
3y
18
27
Bayesian Evolving-to-Extinction
abramdemski
2y
13
117
Monitoring for deceptive alignment
evhub
3mo
7
89
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
80
How likely is deceptive alignment?
evhub
3mo
21
40
Sticky goals: a concrete experiment for understanding deceptive alignment
evhub
3mo
13
39
Will transparency help catch deception? Perhaps not
Matthew Barnett
3y
5
29
Framings of Deceptive Alignment
peterbarnett
7mo
6
27
The Speed + Simplicity Prior is probably anti-deceptive
7mo
29
19
Precursor checking for deceptive alignment
evhub
4mo
0
16
Training Trace Priors
Adam Jermyn
6mo
17