Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
1564 posts
AI
Inner Alignment
Interpretability (ML & AI)
AI Timelines
GPT
Research Agendas
AI Takeoff
Value Learning
Machine Learning (ML)
Conjecture (org)
Mesa-Optimization
Outer Alignment
349 posts
Abstraction
Impact Regularization
Rationality
World Modeling
Decision Theory
Human Values
Goal-Directedness
Anthropics
Utility Functions
Finite Factored Sets
Shard Theory
Fixed Point Theorems
759
Simulators
janus
3mo
103
503
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
3mo
83
494
chinchilla's wild implications
nostalgebraist
4mo
114
486
What 2026 looks like
Daniel Kokotajlo
1y
98
422
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
410
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
409
Discussion with Eliezer Yudkowsky on AGI interventions
Rob Bensinger
1y
257
334
EfficientZero: How It Works
1a3orn
1y
42
324
The Parable of Predict-O-Matic
abramdemski
3y
42
315
Two-year update on my personal AI timelines
Ajeya Cotra
4mo
60
307
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
297
Why Agent Foundations? An Overly Abstract Explanation
johnswentworth
9mo
54
296
Are we in an AI overhang?
Andy Jones
2y
109
285
On how various plans miss the hard bits of the alignment challenge
So8res
5mo
81
981
Where I agree and disagree with Eliezer
paulfchristiano
6mo
205
381
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
249
The shard theory of human values
Quintin Pope
3mo
57
206
Realism about rationality
Richard_Ngo
4y
145
196
Utility Maximization = Description Length Minimization
johnswentworth
1y
40
191
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
175
2021 AI Alignment Literature Review and Charity Comparison
Larks
12mo
26
171
Finite Factored Sets in Pictures
Magdalena Wache
9d
29
163
Evolution of Modularity
johnswentworth
3y
12
160
Can you control the past?
Joe Carlsmith
1y
93
146
why assume AGIs will optimize for fixed goals?
nostalgebraist
6mo
52
141
Finite Factored Sets
Scott Garrabrant
1y
94
137
What's Up With Confusingly Pervasive Consequentialism?
Raemon
11mo
88
137
My research methodology
paulfchristiano
1y
36