Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
54 posts
Oracle AI
AI Boxing (Containment)
Acausal Trade
Bounties (closed)
Superrationality
Parables & Fables
Self Fulfilling/Refuting Prophecies
Values handshakes
Verification
Computer Security & Cryptography
26 posts
Myopia
Deceptive Alignment
Deception
258
The Parable of Predict-O-Matic
abramdemski
3y
42
145
Decision theory does not imply that we get to have nice things
So8res
2mo
53
72
Prize for probable problems
paulfchristiano
4y
63
67
Results of $1,000 Oracle contest!
Stuart_Armstrong
2y
2
65
Cryptographic Boxes for Unfriendly AI
paulfchristiano
12y
162
60
Contest: $1,000 for good questions to ask to an Oracle AI
Stuart_Armstrong
3y
156
57
Counterfactual Oracles = online supervised learning with random selection of training episodes
Wei_Dai
3y
26
48
Superrational Agents Kelly Bet Influence!
abramdemski
1y
5
43
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
36
Self-Supervised Learning and AGI Safety
Steven Byrnes
3y
9
35
Side-channels: input versus output
davidad
8d
9
33
Safely and usefully spectating on AIs optimizing over toy worlds
AlexMennen
4y
16
32
Reflective oracles as a solution to the converse Lawvere problem
SamEisenstat
4y
0
29
Superrationality in arbitrary games
Vanessa Kosoy
7y
0
119
Monitoring for deceptive alignment
evhub
3mo
7
107
The Credit Assignment Problem
abramdemski
3y
40
85
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
67
Arguments against myopic training
Richard_Ngo
2y
39
65
Why GPT wants to mesa-optimize & how we might change this
John_Maxwell
2y
32
64
How likely is deceptive alignment?
evhub
3mo
21
63
Partial Agency
abramdemski
3y
18
57
Open Problems with Myopia
Mark Xu
1y
16
57
AI safety via market making
evhub
2y
45
50
LCDT, A Myopic Decision Theory
adamShimi
1y
51
49
Bayesian Evolving-to-Extinction
abramdemski
2y
13
47
Will transparency help catch deception? Perhaps not
Matthew Barnett
3y
5
44
Towards a mechanistic understanding of corrigibility
evhub
3y
26
41
Steering Behaviour: Testing for (Non-)Myopia in Language Models
Evan R. Murphy
15d
16