Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
50 posts
Reinforcement Learning
Inverse Reinforcement Learning
Road To AI Safety Excellence
23 posts
Wireheading
Reward Functions
286
Reward is not the optimization target
TurnTrout
4mo
97
80
Jitters No Evidence of Stupidity in RL
1a3orn
1y
18
71
RAISE is launching their MVP
3y
1
70
Thoughts on "Human-Compatible"
TurnTrout
3y
35
62
Book Review: Human Compatible
Scott Alexander
2y
6
55
My take on Michael Littman on "The HCI of HAI"
Alex Flint
1y
4
37
AI Safety Prerequisites Course: Basic abstract representations of computation
RAISE
3y
2
36
Learning biases and rewards simultaneously
Rohin Shah
3y
3
34
Making a Difference Tempore: Insights from 'Reinforcement Learning: An Introduction'
TurnTrout
4y
6
29
Our plan for 2019-2020: consulting for AI Safety education
RAISE
3y
17
29
Scalar reward is not enough for aligned AGI
Peter Vamplew
11mo
3
28
IRL 1/8: Inverse Reinforcement Learning and the problem of degeneracy
RAISE
3y
2
28
Model Mis-specification and Inverse Reinforcement Learning
Owain_Evans
4y
3
26
Reinforcement learning with imperceptible rewards
Vanessa Kosoy
3y
1
176
Are wireheads happy?
Scott Alexander
12y
107
75
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
49
You cannot be mistaken about (not) wanting to wirehead
Kaj_Sotala
12y
79
47
Reward model hacking as a challenge for reward learning
Erik Jenner
8mo
1
45
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
1mo
0
43
Draft papers for REALab and Decoupled Approval on tampering
Jonathan Uesato
2y
2
38
A definition of wireheading
Anja
10y
80
32
The Stamp Collector
So8res
7y
14
26
generalized wireheading
carado
1mo
7
24
Thoughts on reward engineering
paulfchristiano
3y
30
20
Reinforcement Learner Wireheading
Nate Showell
5mo
2
20
Wireheading and discontinuity
Michele Campolo
2y
4
19
An investigation into when agents may be incentivized to manipulate our beliefs.
Felix Hofstätter
3mo
0
18
Wireheading is in the eye of the beholder
Stuart_Armstrong
3y
10