Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
180 posts
Research Agendas
Embedded Agency
Suffering
Agency
Animal Welfare
Risks of Astronomical Suffering (S-risks)
Robust Agents
Cause Prioritization
Center on Long-Term Risk (CLR)
80,000 Hours
Crucial Considerations
Veg*nism
164 posts
Value Learning
Reinforcement Learning
AI Capabilities
Inverse Reinforcement Learning
Wireheading
Definitions
Reward Functions
The Pointers Problem
Stag Hunt
Road To AI Safety Excellence
Goals
EfficientZero
249
Humans are very reliable agents
alyssavance
6mo
35
216
On how various plans miss the hard bits of the alignment challenge
So8res
5mo
81
147
Introduction to Cartesian Frames
Scott Garrabrant
2y
29
146
Some conceptual alignment research projects
Richard_Ngo
3mo
14
129
Embedded Agents
abramdemski
4y
41
127
Demand offsetting
paulfchristiano
1y
38
117
Our take on CHAI’s research agenda in under 1500 words
Alex Flint
2y
19
111
"Just Suffer Until It Passes"
lionhearted
4y
26
105
Wirehead your Chickens
shminux
4y
53
104
Botworld: a cellular automaton for studying self-modifying agents embedded in their environment
So8res
8y
55
103
The Power of Agency
lukeprog
11y
78
102
Being a Robust Agent
Raemon
4y
32
102
Robust Delegation
abramdemski
4y
10
96
Announcement: AI alignment prize round 3 winners and next round
cousin_it
4y
7
281
Is AI Progress Impossible To Predict?
alyssavance
7mo
38
218
Reward is not the optimization target
TurnTrout
4mo
97
194
EfficientZero: How It Works
1a3orn
1y
42
158
Are wireheads happy?
Scott Alexander
12y
107
139
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
gwern
1y
52
97
The Urgent Meta-Ethics of Friendly Artificial Intelligence
lukeprog
11y
252
93
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
johnswentworth
2y
43
92
Book Review: Human Compatible
Scott Alexander
2y
6
84
Jitters No Evidence of Stupidity in RL
1a3orn
1y
18
77
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
74
Where do selfish values come from?
Wei_Dai
11y
62
71
When AI solves a game, focus on the game's mechanics, not its theme.
Cleo Nardo
27d
7
69
Misc. questions about EfficientZero
Daniel Kokotajlo
1y
17
68
Clarifying "AI Alignment"
paulfchristiano
4y
82