Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
344 posts
Research Agendas
Value Learning
Reinforcement Learning
Embedded Agency
Suffering
AI Capabilities
Agency
Animal Welfare
Inverse Reinforcement Learning
Risks of Astronomical Suffering (S-risks)
Wireheading
Robust Agents
14230 posts
Decision Theory
Utility Functions
Counterfactuals
Goal-Directedness
Nutrition
Newcomb's Problem
VNM Theorem
Updateless Decision Theory
Timeless Decision Theory
Literature Reviews
Functional Decision Theory
Counterfactual Mugging
286
Reward is not the optimization target
TurnTrout
4mo
97
7
Note on algorithms with multiple trained components
Steven Byrnes
6h
1
34
My AGI safety research—2022 review, ’23 plans
Steven Byrnes
6d
6
300
On how various plans miss the hard bits of the alignment challenge
So8res
5mo
81
109
Will we run out of ML data? Evidence from projecting dataset size trends
Pablo Villalobos
1mo
12
91
When AI solves a game, focus on the game's mechanics, not its theme.
Cleo Nardo
27d
7
75
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
23
Should you refrain from having children because of the risk posed by artificial intelligence?
Mientras
3mo
28
26
generalized wireheading
carado
1mo
7
35
What's the Most Impressive Thing That GPT-4 Could Plausibly Do?
bayesed
3mo
24
11
AGIs may value intrinsic rewards more than extrinsic ones
catubc
1mo
6
25
Latent Variables and Model Mis-Specification
jsteinhardt
4y
7
190
Some conceptual alignment research projects
Richard_Ngo
3mo
14
10
Stable Pointers to Value: An Agent Embedded in Its Own Utility Function
abramdemski
5y
9
28
K-complexity is silly; use cross-entropy instead
So8res
1h
4
170
Can you control the past?
Joe Carlsmith
1y
93
-6
Ponzi schemes can be highly profitable if your timing is good
GeneSmith
8d
18
36
Take 7: You should talk about "the human's utility function" less.
Charlie Steiner
12d
22
96
wrapper-minds are the enemy
nostalgebraist
6mo
36
46
What videos should Rational Animations make?
Writer
24d
23
146
Decision theory does not imply that we get to have nice things
So8res
2mo
53
23
"Attention Passengers": not for Signs
jefftk
13d
10
48
Notes on "Can you control the past"
So8res
2mo
40
36
Humans do acausal coordination all the time
Adam Jermyn
1mo
36
19
Decision Theory but also Ghosts
eva_
1mo
21
7
Cerebras Systems unveils a record 1.2 trillion transistor chip for AI
avturchin
3y
4
-31
ChatGPT's new novel rationality technique of fact checking
ChristianKl
9d
5
24
Two New Newcomb Variants
eva_
1mo
22