Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
1564 posts
AI
Inner Alignment
Interpretability (ML & AI)
AI Timelines
GPT
Research Agendas
AI Takeoff
Value Learning
Machine Learning (ML)
Conjecture (org)
Mesa-Optimization
Outer Alignment
349 posts
Abstraction
Impact Regularization
Rationality
World Modeling
Decision Theory
Human Values
Goal-Directedness
Anthropics
Utility Functions
Finite Factored Sets
Shard Theory
Fixed Point Theorems
318
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
259
Humans are very reliable agents
alyssavance
6mo
35
259
Two-year update on my personal AI timelines
Ajeya Cotra
4mo
60
258
The Parable of Predict-O-Matic
abramdemski
3y
42
254
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
252
What 2026 looks like
Daniel Kokotajlo
1y
98
242
Visible Thoughts Project and Bounty Announcement
So8res
1y
104
241
Discussion with Eliezer Yudkowsky on AGI interventions
Rob Bensinger
1y
257
234
chinchilla's wild implications
nostalgebraist
4mo
114
233
Reward is not the optimization target
TurnTrout
4mo
97
231
On how various plans miss the hard bits of the alignment challenge
So8res
5mo
81
231
DeepMind: Generally capable agents emerge from open-ended play
Daniel Kokotajlo
1y
53
223
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
219
ARC's first technical report: Eliciting Latent Knowledge
paulfchristiano
1y
88
573
Where I agree and disagree with Eliezer
paulfchristiano
6mo
205
239
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
201
What's Up With Confusingly Pervasive Consequentialism?
Raemon
11mo
88
170
Utility Maximization = Description Length Minimization
johnswentworth
1y
40
159
Humans provide an untapped wealth of evidence about alignment
TurnTrout
5mo
92
159
My research methodology
paulfchristiano
1y
36
159
Testing The Natural Abstraction Hypothesis: Project Intro
johnswentworth
1y
34
155
Evolution of Modularity
johnswentworth
3y
12
155
The shard theory of human values
Quintin Pope
3mo
57
154
Realism about rationality
Richard_Ngo
4y
145
153
2021 AI Alignment Literature Review and Charity Comparison
Larks
12mo
26
146
Saving Time
Scott Garrabrant
1y
19
145
Fixing The Good Regulator Theorem
johnswentworth
1y
25
140
Shard Theory: An Overview
David Udell
4mo
34