Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
54 posts
Oracle AI
AI Boxing (Containment)
Acausal Trade
Bounties (closed)
Superrationality
Parables & Fables
Self Fulfilling/Refuting Prophecies
Values handshakes
Verification
Computer Security & Cryptography
26 posts
Myopia
Deceptive Alignment
Deception
324
The Parable of Predict-O-Matic
abramdemski
3y
42
139
Decision theory does not imply that we get to have nice things
So8res
2mo
53
67
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
54
Contest: $1,000 for good questions to ask to an Oracle AI
Stuart_Armstrong
3y
156
49
Results of $1,000 Oracle contest!
Stuart_Armstrong
2y
2
48
Prize for probable problems
paulfchristiano
4y
63
43
Cryptographic Boxes for Unfriendly AI
paulfchristiano
12y
162
37
Counterfactual Oracles = online supervised learning with random selection of training episodes
Wei_Dai
3y
26
35
Side-channels: input versus output
davidad
8d
9
34
Superrational Agents Kelly Bet Influence!
abramdemski
1y
5
27
Training goals for large language models
Johannes_Treutlein
5mo
5
22
Breaking Oracles: superrationality and acausal trade
Stuart_Armstrong
3y
15
22
Self-Supervised Learning and AGI Safety
Steven Byrnes
3y
9
17
Analysing: Dangerous messages from future UFAI via Oracles
Stuart_Armstrong
3y
16
117
Monitoring for deceptive alignment
evhub
3mo
7
89
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
80
How likely is deceptive alignment?
evhub
3mo
21
75
The Credit Assignment Problem
abramdemski
3y
40
57
Open Problems with Myopia
Mark Xu
1y
16
53
Partial Agency
abramdemski
3y
18
53
AI safety via market making
evhub
2y
45
50
LCDT, A Myopic Decision Theory
adamShimi
1y
51
45
Why GPT wants to mesa-optimize & how we might change this
John_Maxwell
2y
32
45
Acceptability Verification: A Research Agenda
David Udell
5mo
0
45
Arguments against myopic training
Richard_Ngo
2y
39
44
Towards a mechanistic understanding of corrigibility
evhub
3y
26
40
Sticky goals: a concrete experiment for understanding deceptive alignment
evhub
3mo
13
39
Will transparency help catch deception? Perhaps not
Matthew Barnett
3y
5