Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
80 posts
Oracle AI
Myopia
AI Boxing (Containment)
Deceptive Alignment
Deception
Acausal Trade
Self Fulfilling/Refuting Prophecies
Bounties (closed)
Parables & Fables
Superrationality
Values handshakes
Computer Security & Cryptography
88 posts
Conjecture (org)
Language Models
Refine
Agency
Deconfusion
Scaling Laws
Project Announcement
Encultured AI (org)
Tool AI
Definitions
PaLM
Prompt Engineering
291
The Parable of Predict-O-Matic
abramdemski
3y
42
142
Decision theory does not imply that we get to have nice things
So8res
2mo
53
118
Monitoring for deceptive alignment
evhub
3mo
7
91
The Credit Assignment Problem
abramdemski
3y
40
87
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
72
How likely is deceptive alignment?
evhub
3mo
21
60
Prize for probable problems
paulfchristiano
4y
63
58
Partial Agency
abramdemski
3y
18
58
Results of $1,000 Oracle contest!
Stuart_Armstrong
2y
2
57
Open Problems with Myopia
Mark Xu
1y
16
57
Contest: $1,000 for good questions to ask to an Oracle AI
Stuart_Armstrong
3y
156
56
Arguments against myopic training
Richard_Ngo
2y
39
55
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
55
Why GPT wants to mesa-optimize & how we might change this
John_Maxwell
2y
32
472
Simulators
janus
3mo
103
364
chinchilla's wild implications
nostalgebraist
4mo
114
213
Mysteries of mode collapse
janus
1mo
35
186
We Are Conjecture, A New Alignment Research Startup
Connor Leahy
8mo
24
183
Conjecture: a retrospective after 8 months of work
Connor Leahy
27d
9
166
Announcing the Inverse Scaling Prize ($250k Prize Pool)
Ethan Perez
5mo
14
164
Language models seem to be much better than humans at next-token prediction
Buck
4mo
56
159
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
22d
27
142
Transformer Circuits
evhub
12mo
4
123
Refine: An Incubator for Conceptual Alignment Research Bets
adamShimi
8mo
13
123
Interpreting Neural Networks through the Polytope Lens
Sid Black
2mo
26
119
Beyond Astronomical Waste
Wei_Dai
4y
41
118
The case for becoming a black-box investigator of language models
Buck
7mo
19
112
Who models the models that model models? An exploration of GPT-3's in-context model fitting ability
Lovre
6mo
14