Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
1125 posts
AI
Research Agendas
AI Timelines
Value Learning
AI Takeoff
Embedded Agency
Eliciting Latent Knowledge (ELK)
Community
Reinforcement Learning
Iterated Amplification
Debate (AI safety technique)
Game Theory
321 posts
Conjecture (org)
GPT
Oracle AI
Interpretability (ML & AI)
Myopia
Language Models
OpenAI
AI Boxing (Containment)
Machine Learning (ML)
DeepMind
Acausal Trade
Scaling Laws
259
Humans are very reliable agents
alyssavance
6mo
35
259
Two-year update on my personal AI timelines
Ajeya Cotra
4mo
60
252
What 2026 looks like
Daniel Kokotajlo
1y
98
242
Visible Thoughts Project and Bounty Announcement
So8res
1y
104
241
Discussion with Eliezer Yudkowsky on AGI interventions
Rob Bensinger
1y
257
233
Reward is not the optimization target
TurnTrout
4mo
97
231
On how various plans miss the hard bits of the alignment challenge
So8res
5mo
81
231
DeepMind: Generally capable agents emerge from open-ended play
Daniel Kokotajlo
1y
53
219
ARC's first technical report: Eliciting Latent Knowledge
paulfchristiano
1y
88
217
larger language models may disappoint you [or, an eternally unfinished draft]
nostalgebraist
1y
29
214
Are we in an AI overhang?
Andy Jones
2y
109
213
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
213
Hiring engineers and researchers to help align GPT-3
paulfchristiano
2y
14
212
EfficientZero: How It Works
1a3orn
1y
42
318
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
258
The Parable of Predict-O-Matic
abramdemski
3y
42
254
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
234
chinchilla's wild implications
nostalgebraist
4mo
114
223
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
204
The case for aligning narrowly superhuman models
Ajeya Cotra
1y
74
199
Common misconceptions about OpenAI
Jacob_Hilton
3mo
138
191
New Scaling Laws for Large Language Models
1a3orn
8mo
21
185
Simulators
janus
3mo
103
183
The Plan - 2022 Update
johnswentworth
19d
33
178
Mysteries of mode collapse
janus
1mo
35
167
Chris Olah’s views on AGI safety
evhub
3y
38
164
MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"
Rob Bensinger
1y
13
163
Language models seem to be much better than humans at next-token prediction
Buck
4mo
56