Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
56 posts
Value Learning
The Pointers Problem
Meta-Philosophy
Metaethics
Kolmogorov Complexity
Philosophy
Perceptual Control Theory
11 posts
Inverse Reinforcement Learning
Book Reviews
60
Don't design agents which exploit adversarial inputs
TurnTrout
1mo
61
32
People care about each other even though they have imperfect motivational pointers?
TurnTrout
1mo
25
15
Stable Pointers to Value: An Agent Embedded in Its Own Utility Function
abramdemski
5y
9
30
What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment
xuan
3mo
15
60
Beyond Kolmogorov and Shannon
Alexander Gietelink Oldenziel
1mo
14
104
The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables
johnswentworth
2y
43
56
Humans can be assigned any values whatsoever…
Stuart_Armstrong
4y
26
10
AIs should learn human preferences, not biases
Stuart_Armstrong
8mo
1
17
RFC: Philosophical Conservatism in AI Alignment Research
Gordon Seidoh Worley
4y
13
34
Human-AI Interaction
Rohin Shah
3y
10
1
Humans can be assigned any values whatsoever...
Stuart_Armstrong
5y
0
0
Kolmogorov complexity makes reward learning worse
Stuart_Armstrong
5y
0
49
What is ambitious value learning?
Rohin Shah
4y
28
67
Parsing Chris Mingard on Neural Networks
Alex Flint
1y
27
0
Inverse reinforcement learning on self, pre-ontology-change
Stuart_Armstrong
7y
0
14
Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences
orthonormal
6y
0
1
(C)IRL is not solely a learning process
Stuart_Armstrong
6y
0
3
CIRL Wireheading
tom4everitt
5y
0
15
Delegative Inverse Reinforcement Learning
Vanessa Kosoy
5y
0
27
Agents That Learn From Human Behavior Can't Learn Human Values That Humans Haven't Learned Yet
steven0461
4y
11
63
Thoughts on "Human-Compatible"
TurnTrout
3y
35
70
[Book Review] "The Alignment Problem" by Brian Christian
lsusr
1y
16
33
Model Mis-specification and Inverse Reinforcement Learning
Owain_Evans
4y
3
42
Human-AI Collaboration
Rohin Shah
3y
7
41
Learning biases and rewards simultaneously
Rohin Shah
3y
3