Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
153 posts
Interpretability (ML & AI)
GPT
Machine Learning (ML)
DeepMind
OpenAI
Truth, Semantics, & Meaning
AI Success Models
Bounties & Prizes (active)
AI-assisted Alignment
Lottery Ticket Hypothesis
Computer Science
Honesty
168 posts
Conjecture (org)
Oracle AI
Myopia
Language Models
Refine
Deconfusion
Agency
AI Boxing (Containment)
Deceptive Alignment
Scaling Laws
Deception
Acausal Trade
318
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
254
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
223
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
204
The case for aligning narrowly superhuman models
Ajeya Cotra
1y
74
199
Common misconceptions about OpenAI
Jacob_Hilton
3mo
138
191
New Scaling Laws for Large Language Models
1a3orn
8mo
21
183
The Plan - 2022 Update
johnswentworth
19d
33
167
Chris Olah’s views on AGI safety
evhub
3y
38
164
MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"
Rob Bensinger
1y
13
155
A transparency and interpretability tech tree
evhub
6mo
10
146
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers
lifelonglearner
1y
16
145
interpreting GPT: the logit lens
nostalgebraist
2y
32
135
How much chess engine progress is about adapting to bigger computers?
paulfchristiano
1y
23
134
Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc
johnswentworth
6mo
52
258
The Parable of Predict-O-Matic
abramdemski
3y
42
234
chinchilla's wild implications
nostalgebraist
4mo
114
185
Simulators
janus
3mo
103
178
Mysteries of mode collapse
janus
1mo
35
163
Language models seem to be much better than humans at next-token prediction
Buck
4mo
56
157
Transformer Circuits
evhub
12mo
4
145
Decision theory does not imply that we get to have nice things
So8res
2mo
53
143
Conjecture: a retrospective after 8 months of work
Connor Leahy
27d
9
141
Announcing the Inverse Scaling Prize ($250k Prize Pool)
Ethan Perez
5mo
14
119
Monitoring for deceptive alignment
evhub
3mo
7
118
We Are Conjecture, A New Alignment Research Startup
Connor Leahy
8mo
24
113
The case for becoming a black-box investigator of language models
Buck
7mo
19
108
What I Learned Running Refine
adamShimi
26d
5
108
Beyond Astronomical Waste
Wei_Dai
4y
41