Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
4148 posts
AI
AI Risk
GPT
AI Timelines
Machine Learning (ML)
Anthropics
AI Takeoff
Interpretability (ML & AI)
Existential Risk
Inner Alignment
Neuroscience
Goodhart's Law
14574 posts
Decision Theory
Utility Functions
Embedded Agency
Value Learning
Suffering
Counterfactuals
Nutrition
Animal Welfare
Newcomb's Problem
Research Agendas
VNM Theorem
Risks of Astronomical Suffering (S-risks)
27
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
4h
3
40
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
10
An Open Agency Architecture for Safe Transformative AI
davidad
11h
11
108
The next decades might be wild
Marius Hobbhahn
5d
21
0
I believe some AI doomers are overconfident
FTPickle
6h
4
33
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
57
Reframing inner alignment
davidad
9d
13
3
Will research in AI risk jinx it? Consequences of training AI on AI risk arguments
Yann Dubois
1d
6
70
Bad at Arithmetic, Promising at Math
cohenmacaulay
2d
17
22
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
43
Next Level Seinfeld
Zvi
1d
6
46
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Charlie Steiner
8d
14
13
Will Machines Ever Rule the World? MLAISU W50
Esben Kran
4d
4
84
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout
18d
18
46
K-complexity is silly; use cross-entropy instead
So8res
1h
4
124
Can you control the past?
Joe Carlsmith
1y
93
218
Reward is not the optimization target
TurnTrout
4mo
97
13
Note on algorithms with multiple trained components
Steven Byrnes
6h
1
14
Ponzi schemes can be highly profitable if your timing is good
GeneSmith
8d
18
34
My AGI safety research—2022 review, ’23 plans
Steven Byrnes
6d
6
58
Take 7: You should talk about "the human's utility function" less.
Charlie Steiner
12d
22
88
wrapper-minds are the enemy
nostalgebraist
6mo
36
14
What videos should Rational Animations make?
Writer
24d
23
138
Decision theory does not imply that we get to have nice things
So8res
2mo
53
31
"Attention Passengers": not for Signs
jefftk
13d
10
62
Notes on "Can you control the past"
So8res
2mo
40
66
Humans do acausal coordination all the time
Adam Jermyn
1mo
36
15
Decision Theory but also Ghosts
eva_
1mo
21