Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
1913 posts
AI
World Modeling
Inner Alignment
Rationality
Interpretability (ML & AI)
AI Timelines
Decision Theory
GPT
Research Agendas
Abstraction
Value Learning
Impact Regularization
855 posts
Logical Induction
Threat Models
Goodhart's Law
Practice & Philosophy of Science
Logical Uncertainty
Intellectual Progress (Society-Level)
Radical Probabilism
Epistemology
Ethics & Morality
Software Tools
Fiction
Bayes' Theorem
981
Where I agree and disagree with Eliezer
paulfchristiano
6mo
205
759
Simulators
janus
3mo
103
503
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
3mo
83
494
chinchilla's wild implications
nostalgebraist
4mo
114
486
What 2026 looks like
Daniel Kokotajlo
1y
98
422
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
410
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
409
Discussion with Eliezer Yudkowsky on AGI interventions
Rob Bensinger
1y
257
381
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
334
EfficientZero: How It Works
1a3orn
1y
42
324
The Parable of Predict-O-Matic
abramdemski
3y
42
315
Two-year update on my personal AI timelines
Ajeya Cotra
4mo
60
307
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
297
Why Agent Foundations? An Overly Abstract Explanation
johnswentworth
9mo
54
986
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
6mo
653
517
It Looks Like You're Trying To Take Over The World
gwern
9mo
125
429
Counterarguments to the basic AI x-risk case
KatjaGrace
2mo
122
416
What failure looks like
paulfchristiano
3y
49
413
How To Get Into Independent Research On Alignment/Agency
johnswentworth
1y
33
305
Alignment Research Field Guide
abramdemski
3y
9
292
A central AI alignment problem: capabilities generalization, and the sharp left turn
So8res
6mo
48
284
Six Dimensions of Operational Adequacy in AGI Projects
Eliezer Yudkowsky
6mo
65
265
Lessons learned from talking to >100 academics about AI safety
Marius Hobbhahn
2mo
16
252
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Andrew_Critch
1y
60
240
Another (outer) alignment failure story
paulfchristiano
1y
38
227
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More
Ben Pace
3y
60
221
Call For Distillers
johnswentworth
8mo
42
207
Some AI research areas and their relevance to existential safety
Andrew_Critch
2y
40