Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
153 posts
Interpretability (ML & AI)
GPT
Machine Learning (ML)
DeepMind
OpenAI
Truth, Semantics, & Meaning
AI Success Models
Bounties & Prizes (active)
AI-assisted Alignment
Lottery Ticket Hypothesis
Computer Science
Honesty
168 posts
Conjecture (org)
Oracle AI
Myopia
Language Models
Refine
Deconfusion
Agency
AI Boxing (Containment)
Deceptive Alignment
Scaling Laws
Deception
Acausal Trade
364
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
338
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
265
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
226
Common misconceptions about OpenAI
Jacob_Hilton
3mo
138
223
New Scaling Laws for Large Language Models
1a3orn
8mo
21
211
The Plan - 2022 Update
johnswentworth
19d
33
197
Chris Olah’s views on AGI safety
evhub
3y
38
187
The case for aligning narrowly superhuman models
Ajeya Cotra
1y
74
158
interpreting GPT: the logit lens
nostalgebraist
2y
32
151
Godzilla Strategies
johnswentworth
6mo
65
146
the scaling “inconsistency”: openAI’s new insight
nostalgebraist
2y
14
140
Developmental Stages of GPTs
orthonormal
2y
74
139
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers
lifelonglearner
1y
16
136
MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"
Rob Bensinger
1y
13
472
Simulators
janus
3mo
103
364
chinchilla's wild implications
nostalgebraist
4mo
114
291
The Parable of Predict-O-Matic
abramdemski
3y
42
213
Mysteries of mode collapse
janus
1mo
35
186
We Are Conjecture, A New Alignment Research Startup
Connor Leahy
8mo
24
183
Conjecture: a retrospective after 8 months of work
Connor Leahy
27d
9
166
Announcing the Inverse Scaling Prize ($250k Prize Pool)
Ethan Perez
5mo
14
164
Language models seem to be much better than humans at next-token prediction
Buck
4mo
56
159
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
22d
27
142
Transformer Circuits
evhub
12mo
4
142
Decision theory does not imply that we get to have nice things
So8res
2mo
53
123
Refine: An Incubator for Conceptual Alignment Research Bets
adamShimi
8mo
13
123
Interpreting Neural Networks through the Polytope Lens
Sid Black
2mo
26
119
Beyond Astronomical Waste
Wei_Dai
4y
41