Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
101 posts
Reinforcement Learning
AI Capabilities
Inverse Reinforcement Learning
Wireheading
Definitions
Reward Functions
Stag Hunt
Road To AI Safety Excellence
Goals
Prompt Engineering
EfficientZero
PaLM
63 posts
Value Learning
The Pointers Problem
281
Is AI Progress Impossible To Predict?
alyssavance
7mo
38
218
Reward is not the optimization target
TurnTrout
4mo
97
194
EfficientZero: How It Works
1a3orn
1y
42
158
Are wireheads happy?
Scott Alexander
12y
107
139
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
gwern
1y
52
92
Book Review: Human Compatible
Scott Alexander
2y
6
84
Jitters No Evidence of Stupidity in RL
1a3orn
1y
18
77
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
71
When AI solves a game, focus on the game's mechanics, not its theme.
Cleo Nardo
27d
7
69
Misc. questions about EfficientZero
Daniel Kokotajlo
1y
17
66
A definition of wireheading
Anja
10y
80
63
RAISE is launching their MVP
3y
1
63
My take on Michael Littman on "The HCI of HAI"
Alex Flint
1y
4
58
The Stamp Collector
So8res
7y
14
97
The Urgent Meta-Ethics of Friendly Artificial Intelligence
lukeprog
11y
252
93
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
johnswentworth
2y
43
74
Where do selfish values come from?
Wei_Dai
11y
62
68
Clarifying "AI Alignment"
paulfchristiano
4y
82
64
Why we need a *theory* of human values
Stuart_Armstrong
4y
15
59
Preface to the sequence on value learning
Rohin Shah
4y
6
58
The E-Coli Test for AI Alignment
johnswentworth
4y
24
54
Conclusion to the sequence on value learning
Rohin Shah
3y
20
52
What is ambitious value learning?
Rohin Shah
4y
28
51
Humans can be assigned any values whatsoever…
Stuart_Armstrong
4y
26
51
Intuitions about goal-directed behavior
Rohin Shah
4y
15
50
The easy goal inference problem is still hard
paulfchristiano
4y
19
48
Different perspectives on concept extrapolation
Stuart_Armstrong
8mo
7
45
Two questions about CEV that worry me
cousin_it
12y
142