Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
3 posts
Deceptive Alignment
6 posts
Deception
87
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
35
Sticky goals: a concrete experiment for understanding deceptive alignment
evhub
3mo
13
23
Framings of Deceptive Alignment
peterbarnett
7mo
6
118
Monitoring for deceptive alignment
evhub
3mo
7
72
How likely is deceptive alignment?
evhub
3mo
21
18
Precursor checking for deceptive alignment
evhub
4mo
0
30
The Speed + Simplicity Prior is probably anti-deceptive
7mo
29
12
Training Trace Priors
Adam Jermyn
6mo
17
43
Will transparency help catch deception? Perhaps not
Matthew Barnett
3y
5