Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
50 posts
Reinforcement Learning
Inverse Reinforcement Learning
Road To AI Safety Excellence
23 posts
Wireheading
Reward Functions
252
Reward is not the optimization target
TurnTrout
4mo
97
8
AGIs may value intrinsic rewards more than extrinsic ones
catubc
1mo
6
16
A Survey of Foundational Methods in Inverse Reinforcement Learning
adamk
3mo
0
82
Jitters No Evidence of Stupidity in RL
1a3orn
1y
18
25
Is CIRL a promising agenda?
Chris_Leong
6mo
12
59
My take on Michael Littman on "The HCI of HAI"
Alex Flint
1y
4
16
RLHF
Ansh Radhakrishnan
7mo
5
77
Book Review: Human Compatible
Scott Alexander
2y
6
63
Thoughts on "Human-Compatible"
TurnTrout
3y
35
15
Scalar reward is not enough for aligned AGI
Peter Vamplew
11mo
3
67
RAISE is launching their MVP
3y
1
37
Book review: Human Compatible
PeterMcCluskey
2y
2
41
Learning biases and rewards simultaneously
Rohin Shah
3y
3
33
Model Mis-specification and Inverse Reinforcement Learning
Owain_Evans
4y
3
10
Note on algorithms with multiple trained components
Steven Byrnes
6h
1
40
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
1mo
0
21
generalized wireheading
carado
1mo
7
76
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
15
An investigation into when agents may be incentivized to manipulate our beliefs.
Felix Hofstätter
3mo
0
25
Reward model hacking as a challenge for reward learning
Erik Jenner
8mo
1
8
Reinforcement Learner Wireheading
Nate Showell
5mo
2
47
Draft papers for REALab and Decoupled Approval on tampering
Jonathan Uesato
2y
2
20
$100/$50 rewards for good references
Stuart_Armstrong
1y
5
167
Are wireheads happy?
Scott Alexander
12y
107
21
Wireheading and discontinuity
Michele Campolo
2y
4
30
Thoughts on reward engineering
paulfchristiano
3y
30
22
Defining AI wireheading
Stuart_Armstrong
3y
9
26
Wireheading is in the eye of the beholder
Stuart_Armstrong
3y
10