Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
80 posts
Oracle AI
Myopia
AI Boxing (Containment)
Deceptive Alignment
Deception
Acausal Trade
Self Fulfilling/Refuting Prophecies
Bounties (closed)
Parables & Fables
Superrationality
Values handshakes
Computer Security & Cryptography
88 posts
Conjecture (org)
Language Models
Refine
Agency
Deconfusion
Scaling Laws
Project Announcement
Encultured AI (org)
Tool AI
Definitions
PaLM
Prompt Engineering
324
The Parable of Predict-O-Matic
abramdemski
3y
42
139
Decision theory does not imply that we get to have nice things
So8res
2mo
53
117
Monitoring for deceptive alignment
evhub
3mo
7
89
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
80
How likely is deceptive alignment?
evhub
3mo
21
75
The Credit Assignment Problem
abramdemski
3y
40
67
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
57
Open Problems with Myopia
Mark Xu
1y
16
54
Contest: $1,000 for good questions to ask to an Oracle AI
Stuart_Armstrong
3y
156
53
Partial Agency
abramdemski
3y
18
53
AI safety via market making
evhub
2y
45
50
LCDT, A Myopic Decision Theory
adamShimi
1y
51
49
Results of $1,000 Oracle contest!
Stuart_Armstrong
2y
2
48
Prize for probable problems
paulfchristiano
4y
63
759
Simulators
janus
3mo
103
494
chinchilla's wild implications
nostalgebraist
4mo
114
254
We Are Conjecture, A New Alignment Research Startup
Connor Leahy
8mo
24
248
Mysteries of mode collapse
janus
1mo
35
223
Conjecture: a retrospective after 8 months of work
Connor Leahy
27d
9
222
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
22d
27
191
Announcing the Inverse Scaling Prize ($250k Prize Pool)
Ethan Perez
5mo
14
190
Interpreting Neural Networks through the Polytope Lens
Sid Black
2mo
26
170
Refine: An Incubator for Conceptual Alignment Research Bets
adamShimi
8mo
13
165
Language models seem to be much better than humans at next-token prediction
Buck
4mo
56
140
Who models the models that model models? An exploration of GPT-3's in-context model fitting ability
Lovre
6mo
14
130
Beyond Astronomical Waste
Wei_Dai
4y
41
127
Transformer Circuits
evhub
12mo
4
127
Circumventing interpretability: How to defeat mind-readers
Lee Sharkey
5mo
8