Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
17 posts
Myopia
9 posts
Deceptive Alignment
Deception
41
Steering Behaviour: Testing for (Non-)Myopia in Language Models
Evan R. Murphy
15d
16
21
Understanding and controlling auto-induced distributional shift
LRudL
1y
3
50
LCDT, A Myopic Decision Theory
adamShimi
1y
51
31
Random Thoughts on Predict-O-Matic
abramdemski
3y
3
63
Partial Agency
abramdemski
3y
18
107
The Credit Assignment Problem
abramdemski
3y
40
65
Why GPT wants to mesa-optimize & how we might change this
John_Maxwell
2y
32
44
Towards a mechanistic understanding of corrigibility
evhub
3y
26
41
Acceptability Verification: A Research Agenda
David Udell
5mo
0
67
Arguments against myopic training
Richard_Ngo
2y
39
57
Open Problems with Myopia
Mark Xu
1y
16
36
Evan Hubinger on Homogeneity in Takeoff Speeds, Learned Optimization and Interpretability
Michaƫl Trazzi
1y
0
24
The Dualist Predict-O-Matic ($100 prize)
John_Maxwell
3y
35
57
AI safety via market making
evhub
2y
45
64
How likely is deceptive alignment?
evhub
3mo
21
30
Sticky goals: a concrete experiment for understanding deceptive alignment
evhub
3mo
13
119
Monitoring for deceptive alignment
evhub
3mo
7
8
Training Trace Priors
Adam Jermyn
6mo
17
17
Framings of Deceptive Alignment
peterbarnett
7mo
6
47
Will transparency help catch deception? Perhaps not
Matthew Barnett
3y
5
33
The Speed + Simplicity Prior is probably anti-deceptive
7mo
29
85
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
17
Precursor checking for deceptive alignment
evhub
4mo
0