Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
101 posts
Reinforcement Learning
AI Capabilities
Inverse Reinforcement Learning
Wireheading
Definitions
Reward Functions
Stag Hunt
Road To AI Safety Excellence
Goals
Prompt Engineering
EfficientZero
PaLM
63 posts
Value Learning
The Pointers Problem
276
Is AI Progress Impossible To Predict?
alyssavance
7mo
38
273
EfficientZero: How It Works
1a3orn
1y
42
252
Reward is not the optimization target
TurnTrout
4mo
97
167
Are wireheads happy?
Scott Alexander
12y
107
134
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
gwern
1y
52
82
Jitters No Evidence of Stupidity in RL
1a3orn
1y
18
81
When AI solves a game, focus on the game's mechanics, not its theme.
Cleo Nardo
27d
7
77
Book Review: Human Compatible
Scott Alexander
2y
6
76
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
74
Will we run out of ML data? Evidence from projecting dataset size trends
Pablo Villalobos
1mo
12
67
RAISE is launching their MVP
3y
1
63
Thoughts on "Human-Compatible"
TurnTrout
3y
35
59
My take on Michael Littman on "The HCI of HAI"
Alex Flint
1y
4
58
Competitive programming with AlphaCode
Algon
10mo
37
104
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
johnswentworth
2y
43
76
The Urgent Meta-Ethics of Friendly Artificial Intelligence
lukeprog
11y
252
69
The E-Coli Test for AI Alignment
johnswentworth
4y
24
68
Preface to the sequence on value learning
Rohin Shah
4y
6
65
Why we need a *theory* of human values
Stuart_Armstrong
4y
15
64
Clarifying "AI Alignment"
paulfchristiano
4y
82
58
Where do selfish values come from?
Wei_Dai
11y
62
56
Humans can be assigned any values whatsoever…
Stuart_Armstrong
4y
26
52
Intuitions about goal-directed behavior
Rohin Shah
4y
15
50
The easy goal inference problem is still hard
paulfchristiano
4y
19
49
What is ambitious value learning?
Rohin Shah
4y
28
49
Conclusion to the sequence on value learning
Rohin Shah
3y
20
46
Future directions for ambitious value learning
Rohin Shah
4y
9
42
Different perspectives on concept extrapolation
Stuart_Armstrong
8mo
7