Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
56 posts
Value Learning
The Pointers Problem
Meta-Philosophy
Metaethics
Kolmogorov Complexity
Philosophy
Perceptual Control Theory
11 posts
Inverse Reinforcement Learning
Book Reviews
109
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
johnswentworth
2y
43
80
Beyond Kolmogorov and Shannon
Alexander Gietelink Oldenziel
1mo
14
74
Preface to the sequence on value learning
Rohin Shah
4y
6
68
Parsing Chris Mingard on Neural Networks
Alex Flint
1y
27
67
Don't design agents which exploit adversarial inputs
TurnTrout
1mo
61
58
Humans can be assigned any values whatsoever…
Stuart_Armstrong
4y
26
57
Clarifying "AI Alignment"
paulfchristiano
4y
82
51
Intuitions about goal-directed behavior
Rohin Shah
4y
15
48
Some Thoughts on Metaphilosophy
Wei_Dai
3y
27
48
The easy goal inference problem is still hard
paulfchristiano
4y
19
47
Future directions for ambitious value learning
Rohin Shah
4y
9
46
Policy Alignment
abramdemski
4y
25
45
What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment
xuan
3mo
15
45
Using vector fields to visualise preferences and make them consistent
MichaelA
2y
32
71
[Book Review] "The Alignment Problem" by Brian Christian
lsusr
1y
16
67
Thoughts on "Human-Compatible"
TurnTrout
3y
35
34
Learning biases and rewards simultaneously
Rohin Shah
3y
3
31
Human-AI Collaboration
Rohin Shah
3y
7
27
Model Mis-specification and Inverse Reinforcement Learning
Owain_Evans
4y
3
20
Agents That Learn From Human Behavior Can't Learn Human Values That Humans Haven't Learned Yet
steven0461
4y
11
12
Delegative Inverse Reinforcement Learning
Vanessa Kosoy
5y
0
11
Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences
orthonormal
6y
0
2
CIRL Wireheading
tom4everitt
5y
0
1
(C)IRL is not solely a learning process
Stuart_Armstrong
6y
0
0
Inverse reinforcement learning on self, pre-ontology-change
Stuart_Armstrong
7y
0