Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
17 posts
Myopia
9 posts
Deceptive Alignment
Deception
91
The Credit Assignment Problem
abramdemski
3y
40
58
Partial Agency
abramdemski
3y
18
57
Open Problems with Myopia
Mark Xu
1y
16
56
Arguments against myopic training
Richard_Ngo
2y
39
55
Why GPT wants to mesa-optimize & how we might change this
John_Maxwell
2y
32
55
AI safety via market making
evhub
2y
45
50
LCDT, A Myopic Decision Theory
adamShimi
1y
51
44
Towards a mechanistic understanding of corrigibility
evhub
3y
26
43
Acceptability Verification: A Research Agenda
David Udell
5mo
0
38
Bayesian Evolving-to-Extinction
abramdemski
2y
13
37
Steering Behaviour: Testing for (Non-)Myopia in Language Models
Evan R. Murphy
15d
16
32
Defining Myopia
abramdemski
3y
18
31
Random Thoughts on Predict-O-Matic
abramdemski
3y
3
28
Evan Hubinger on Homogeneity in Takeoff Speeds, Learned Optimization and Interpretability
Michaƫl Trazzi
1y
0
118
Monitoring for deceptive alignment
evhub
3mo
7
87
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
72
How likely is deceptive alignment?
evhub
3mo
21
43
Will transparency help catch deception? Perhaps not
Matthew Barnett
3y
5
35
Sticky goals: a concrete experiment for understanding deceptive alignment
evhub
3mo
13
30
The Speed + Simplicity Prior is probably anti-deceptive
7mo
29
23
Framings of Deceptive Alignment
peterbarnett
7mo
6
18
Precursor checking for deceptive alignment
evhub
4mo
0
12
Training Trace Priors
Adam Jermyn
6mo
17