Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
80 posts
Oracle AI
Myopia
AI Boxing (Containment)
Deceptive Alignment
Deception
Acausal Trade
Self Fulfilling/Refuting Prophecies
Bounties (closed)
Parables & Fables
Superrationality
Values handshakes
Computer Security & Cryptography
88 posts
Conjecture (org)
Language Models
Refine
Agency
Deconfusion
Scaling Laws
Project Announcement
Encultured AI (org)
Tool AI
Definitions
PaLM
Prompt Engineering
258
The Parable of Predict-O-Matic
abramdemski
3y
42
145
Decision theory does not imply that we get to have nice things
So8res
2mo
53
119
Monitoring for deceptive alignment
evhub
3mo
7
107
The Credit Assignment Problem
abramdemski
3y
40
85
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
1mo
13
72
Prize for probable problems
paulfchristiano
4y
63
67
Arguments against myopic training
Richard_Ngo
2y
39
67
Results of $1,000 Oracle contest!
Stuart_Armstrong
2y
2
65
Cryptographic Boxes for Unfriendly AI
paulfchristiano
12y
162
65
Why GPT wants to mesa-optimize & how we might change this
John_Maxwell
2y
32
64
How likely is deceptive alignment?
evhub
3mo
21
63
Partial Agency
abramdemski
3y
18
60
Contest: $1,000 for good questions to ask to an Oracle AI
Stuart_Armstrong
3y
156
57
Open Problems with Myopia
Mark Xu
1y
16
234
chinchilla's wild implications
nostalgebraist
4mo
114
185
Simulators
janus
3mo
103
178
Mysteries of mode collapse
janus
1mo
35
163
Language models seem to be much better than humans at next-token prediction
Buck
4mo
56
157
Transformer Circuits
evhub
12mo
4
143
Conjecture: a retrospective after 8 months of work
Connor Leahy
27d
9
141
Announcing the Inverse Scaling Prize ($250k Prize Pool)
Ethan Perez
5mo
14
118
We Are Conjecture, A New Alignment Research Startup
Connor Leahy
8mo
24
113
The case for becoming a black-box investigator of language models
Buck
7mo
19
108
What I Learned Running Refine
adamShimi
26d
5
108
Beyond Astronomical Waste
Wei_Dai
4y
41
106
Testing PaLM prompts on GPT3
Yitz
8mo
15
105
Announcing Encultured AI: Building a Video Game
Andrew_Critch
4mo
26
99
Help ARC evaluate capabilities of current language models (still need people)
Beth Barnes
5mo
6