Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
1564 posts
AI
Inner Alignment
Interpretability (ML & AI)
AI Timelines
GPT
Research Agendas
AI Takeoff
Value Learning
Machine Learning (ML)
Conjecture (org)
Mesa-Optimization
Outer Alignment
349 posts
Abstraction
Impact Regularization
Rationality
World Modeling
Decision Theory
Human Values
Goal-Directedness
Anthropics
Utility Functions
Finite Factored Sets
Shard Theory
Fixed Point Theorems
472
Simulators
janus
3mo
103
369
What 2026 looks like
Daniel Kokotajlo
1y
98
364
chinchilla's wild implications
nostalgebraist
4mo
114
364
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
344
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
3mo
83
338
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
325
Discussion with Eliezer Yudkowsky on AGI interventions
Rob Bensinger
1y
257
291
The Parable of Predict-O-Matic
abramdemski
3y
42
287
Two-year update on my personal AI timelines
Ajeya Cotra
4mo
60
273
EfficientZero: How It Works
1a3orn
1y
42
265
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
258
On how various plans miss the hard bits of the alignment challenge
So8res
5mo
81
255
Are we in an AI overhang?
Andy Jones
2y
109
252
Reward is not the optimization target
TurnTrout
4mo
97
777
Where I agree and disagree with Eliezer
paulfchristiano
6mo
205
310
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
202
The shard theory of human values
Quintin Pope
3mo
57
183
Utility Maximization = Description Length Minimization
johnswentworth
1y
40
180
Realism about rationality
Richard_Ngo
4y
145
175
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
169
What's Up With Confusingly Pervasive Consequentialism?
Raemon
11mo
88
164
2021 AI Alignment Literature Review and Charity Comparison
Larks
12mo
26
159
Evolution of Modularity
johnswentworth
3y
12
148
My research methodology
paulfchristiano
1y
36
148
Finite Factored Sets in Pictures
Magdalena Wache
9d
29
147
Can you control the past?
Joe Carlsmith
1y
93
145
Testing The Natural Abstraction Hypothesis: Project Intro
johnswentworth
1y
34
137
Finite Factored Sets
Scott Garrabrant
1y
94