Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
153 posts
Interpretability (ML & AI)
GPT
Machine Learning (ML)
DeepMind
OpenAI
Truth, Semantics, & Meaning
AI Success Models
Bounties & Prizes (active)
AI-assisted Alignment
Lottery Ticket Hypothesis
Computer Science
Honesty
168 posts
Conjecture (org)
Oracle AI
Myopia
Language Models
Refine
Deconfusion
Agency
AI Boxing (Containment)
Deceptive Alignment
Scaling Laws
Deception
Acausal Trade
422
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
410
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
307
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
255
New Scaling Laws for Large Language Models
1a3orn
8mo
21
253
Common misconceptions about OpenAI
Jacob_Hilton
3mo
138
239
The Plan - 2022 Update
johnswentworth
19d
33
227
Chris Olah’s views on AGI safety
evhub
3y
38
175
Godzilla Strategies
johnswentworth
6mo
65
173
A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]
Dan H
7mo
5
171
interpreting GPT: the logit lens
nostalgebraist
2y
32
170
The case for aligning narrowly superhuman models
Ajeya Cotra
1y
74
169
the scaling “inconsistency”: openAI’s new insight
nostalgebraist
2y
14
156
Understanding “Deep Double Descent”
evhub
3y
51
155
Developmental Stages of GPTs
orthonormal
2y
74
759
Simulators
janus
3mo
103
494
chinchilla's wild implications
nostalgebraist
4mo
114
324
The Parable of Predict-O-Matic
abramdemski
3y
42
254
We Are Conjecture, A New Alignment Research Startup
Connor Leahy
8mo
24
248
Mysteries of mode collapse
janus
1mo
35
223
Conjecture: a retrospective after 8 months of work
Connor Leahy
27d
9
222
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
22d
27
191
Announcing the Inverse Scaling Prize ($250k Prize Pool)
Ethan Perez
5mo
14
190
Interpreting Neural Networks through the Polytope Lens
Sid Black
2mo
26
170
Refine: An Incubator for Conceptual Alignment Research Bets
adamShimi
8mo
13
165
Language models seem to be much better than humans at next-token prediction
Buck
4mo
56
140
Who models the models that model models? An exploration of GPT-3's in-context model fitting ability
Lovre
6mo
14
139
Decision theory does not imply that we get to have nice things
So8res
2mo
53
130
Beyond Astronomical Waste
Wei_Dai
4y
41