Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
54 posts
Oracle AI
AI Boxing (Containment)
Acausal Trade
Bounties (closed)
Superrationality
Parables & Fables
Self Fulfilling/Refuting Prophecies
Values handshakes
Verification
Computer Security & Cryptography
26 posts
Myopia
Deceptive Alignment
Deception
43
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
35
Side-channels: input versus output
davidad
8d
9
145
Decision theory does not imply that we get to have nice things
So8res
2mo
53
258
The Parable of Predict-O-Matic
abramdemski
3y
42
25
Training goals for large language models
Johannes_Treutlein
5mo
5
48
Superrational Agents Kelly Bet Influence!
abramdemski
1y
5
67
Results of $1,000 Oracle contest!
Stuart_Armstrong
2y
2
60
Contest: $1,000 for good questions to ask to an Oracle AI
Stuart_Armstrong
3y
156
57
Counterfactual Oracles = online supervised learning with random selection of training episodes
Wei_Dai
3y
26
72
Prize for probable problems
paulfchristiano
4y
63
36
Self-Supervised Learning and AGI Safety
Steven Byrnes
3y
9
28
Oracles: reject all deals - break superrationality, with superrationality
Stuart_Armstrong
3y
4
28
Breaking Oracles: superrationality and acausal trade
Stuart_Armstrong
3y
15
27
Analysing: Dangerous messages from future UFAI via Oracles
Stuart_Armstrong
3y
16
41
Steering Behaviour: Testing for (Non-)Myopia in Language Models
Evan R. Murphy
15d
16
85
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
119
Monitoring for deceptive alignment
evhub
3mo
7
64
How likely is deceptive alignment?
evhub
3mo
21
30
Sticky goals: a concrete experiment for understanding deceptive alignment
evhub
3mo
13
41
Acceptability Verification: A Research Agenda
David Udell
5mo
0
33
The Speed + Simplicity Prior is probably anti-deceptive
7mo
29
17
Precursor checking for deceptive alignment
evhub
4mo
0
50
LCDT, A Myopic Decision Theory
adamShimi
1y
51
57
Open Problems with Myopia
Mark Xu
1y
16
107
The Credit Assignment Problem
abramdemski
3y
40
17
Framings of Deceptive Alignment
peterbarnett
7mo
6
65
Why GPT wants to mesa-optimize & how we might change this
John_Maxwell
2y
32
67
Arguments against myopic training
Richard_Ngo
2y
39