Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
54 posts
Oracle AI
AI Boxing (Containment)
Acausal Trade
Bounties (closed)
Superrationality
Parables & Fables
Self Fulfilling/Refuting Prophecies
Values handshakes
Verification
Computer Security & Cryptography
26 posts
Myopia
Deceptive Alignment
Deception
291
The Parable of Predict-O-Matic
abramdemski
3y
42
142
Decision theory does not imply that we get to have nice things
So8res
2mo
53
60
Prize for probable problems
paulfchristiano
4y
63
58
Results of $1,000 Oracle contest!
Stuart_Armstrong
2y
2
57
Contest: $1,000 for good questions to ask to an Oracle AI
Stuart_Armstrong
3y
156
55
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
54
Cryptographic Boxes for Unfriendly AI
paulfchristiano
12y
162
47
Counterfactual Oracles = online supervised learning with random selection of training episodes
Wei_Dai
3y
26
41
Superrational Agents Kelly Bet Influence!
abramdemski
1y
5
35
Side-channels: input versus output
davidad
8d
9
29
Self-Supervised Learning and AGI Safety
Steven Byrnes
3y
9
26
Training goals for large language models
Johannes_Treutlein
5mo
5
25
Breaking Oracles: superrationality and acausal trade
Stuart_Armstrong
3y
15
24
Reflective oracles as a solution to the converse Lawvere problem
SamEisenstat
4y
0
118
Monitoring for deceptive alignment
evhub
3mo
7
91
The Credit Assignment Problem
abramdemski
3y
40
87
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
72
How likely is deceptive alignment?
evhub
3mo
21
58
Partial Agency
abramdemski
3y
18
57
Open Problems with Myopia
Mark Xu
1y
16
56
Arguments against myopic training
Richard_Ngo
2y
39
55
Why GPT wants to mesa-optimize & how we might change this
John_Maxwell
2y
32
55
AI safety via market making
evhub
2y
45
50
LCDT, A Myopic Decision Theory
adamShimi
1y
51
44
Towards a mechanistic understanding of corrigibility
evhub
3y
26
43
Acceptability Verification: A Research Agenda
David Udell
5mo
0
43
Will transparency help catch deception? Perhaps not
Matthew Barnett
3y
5
38
Bayesian Evolving-to-Extinction
abramdemski
2y
13