Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
17 posts
Myopia
9 posts
Deceptive Alignment
Deception
33
Steering Behaviour: Testing for (Non-)Myopia in Language Models
Evan R. Murphy
15d
16
45
Acceptability Verification: A Research Agenda
David Udell
5mo
0
50
LCDT, A Myopic Decision Theory
adamShimi
1y
51
31
Understanding and controlling auto-induced distributional shift
LRudL
1y
3
57
Open Problems with Myopia
Mark Xu
1y
16
75
The Credit Assignment Problem
abramdemski
3y
40
53
AI safety via market making
evhub
2y
45
45
Why GPT wants to mesa-optimize & how we might change this
John_Maxwell
2y
32
45
Arguments against myopic training
Richard_Ngo
2y
39
53
Partial Agency
abramdemski
3y
18
20
Evan Hubinger on Homogeneity in Takeoff Speeds, Learned Optimization and Interpretability
Michaƫl Trazzi
1y
0
44
Towards a mechanistic understanding of corrigibility
evhub
3y
26
31
Random Thoughts on Predict-O-Matic
abramdemski
3y
3
27
Bayesian Evolving-to-Extinction
abramdemski
2y
13
89
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
117
Monitoring for deceptive alignment
evhub
3mo
7
80
How likely is deceptive alignment?
evhub
3mo
21
40
Sticky goals: a concrete experiment for understanding deceptive alignment
evhub
3mo
13
19
Precursor checking for deceptive alignment
evhub
4mo
0
29
Framings of Deceptive Alignment
peterbarnett
7mo
6
27
The Speed + Simplicity Prior is probably anti-deceptive
7mo
29
16
Training Trace Priors
Adam Jermyn
6mo
17
39
Will transparency help catch deception? Perhaps not
Matthew Barnett
3y
5