Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
344 posts
Research Agendas
Value Learning
Reinforcement Learning
Embedded Agency
Suffering
AI Capabilities
Agency
Animal Welfare
Inverse Reinforcement Learning
Risks of Astronomical Suffering (S-risks)
Wireheading
Robust Agents
14230 posts
Decision Theory
Utility Functions
Counterfactuals
Goal-Directedness
Nutrition
Newcomb's Problem
VNM Theorem
Updateless Decision Theory
Timeless Decision Theory
Literature Reviews
Functional Decision Theory
Counterfactual Mugging
352
EfficientZero: How It Works
1a3orn
1y
42
300
On how various plans miss the hard bits of the alignment challenge
So8res
5mo
81
286
Reward is not the optimization target
TurnTrout
4mo
97
271
Is AI Progress Impossible To Predict?
alyssavance
7mo
38
267
Embedded Agents
abramdemski
4y
41
247
Humans are very reliable agents
alyssavance
6mo
35
202
Embedded Agency (full-text version)
Scott Garrabrant
4y
15
190
Some conceptual alignment research projects
Richard_Ngo
3mo
14
176
Are wireheads happy?
Scott Alexander
12y
107
158
Being a Robust Agent
Raemon
4y
32
143
Introduction to Cartesian Frames
Scott Garrabrant
2y
29
135
Demand offsetting
paulfchristiano
1y
38
129
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
gwern
1y
52
124
New book on s-risks
Tobias_Baumann
1mo
1
170
Can you control the past?
Joe Carlsmith
1y
93
161
Coherent decisions imply consistent utilities
Eliezer Yudkowsky
3y
81
155
why assume AGIs will optimize for fixed goals?
nostalgebraist
6mo
52
146
Newcomb's Problem and Regret of Rationality
Eliezer Yudkowsky
14y
614
146
2020 AI Alignment Literature Review and Charity Comparison
Larks
1y
14
146
Decision theory does not imply that we get to have nice things
So8res
2mo
53
141
Decision Theory
abramdemski
4y
46
137
An Orthodox Case Against Utility Functions
abramdemski
2y
53
137
Impossibility results for unbounded utilities
paulfchristiano
10mo
104
120
Saving Time
Scott Garrabrant
1y
19
117
How I Lost 100 Pounds Using TDT
Zvi
11y
244
108
Decision Theory FAQ
lukeprog
9y
484
105
Utility ≠ Reward
vlad_m
3y
25
102
Coherence arguments do not entail goal-directed behavior
Rohin Shah
4y
69