Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
14 posts
Reinforcement Learning
16 posts
Wireheading
Reward Functions
271
Reward is not the optimization target
TurnTrout
4mo
97
80
Big picture of phasic dopamine
Steven Byrnes
1y
18
77
Jitters No Evidence of Stupidity in RL
1a3orn
1y
18
52
My take on Michael Littman on "The HCI of HAI"
Alex Flint
1y
4
28
Scalar reward is not enough for aligned AGI
Peter Vamplew
11mo
3
26
Conditioning, Prompts, and Fine-Tuning
Adam Jermyn
4mo
9
25
Reinforcement learning with imperceptible rewards
Vanessa Kosoy
3y
1
19
A model of decision-making in the brain (the short version)
Steven Byrnes
1y
0
11
Multi-Agent Inverse Reinforcement Learning: Suboptimal Demonstrations and Alternative Solution Concepts
sage_bergerson
1y
0
2
Some work on connecting UDT and Reinforcement Learning
IAFF-User-111
7y
0
2
Modeling the capabilities of advanced AI systems as episodic reinforcement learning
jessicata
6y
0
2
Vector-Valued Reinforcement Learning
orthonormal
6y
0
1
Delegative Reinforcement Learning with a Merely Sane Advisor
Vanessa Kosoy
5y
2
0
Reward/value learning for reinforcement learning
Stuart_Armstrong
5y
0
89
Scaling Laws for Reward Model Overoptimization
leogao
2mo
11
72
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
61
Towards deconfusing wireheading and reward maximization
leogao
3mo
7
44
Reward model hacking as a challenge for reward learning
Erik Jenner
8mo
1
43
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
1mo
0
42
Four usages of "loss" in AI
TurnTrout
2mo
18
41
Draft papers for REALab and Decoupled Approval on tampering
Jonathan Uesato
2y
2
23
The reward engineering problem
paulfchristiano
3y
3
19
Wireheading and discontinuity
Michele Campolo
2y
4
17
Wireheading is in the eye of the beholder
Stuart_Armstrong
3y
10
16
Wireheading as a potential problem with the new impact measure
Stuart_Armstrong
4y
20
15
Defining AI wireheading
Stuart_Armstrong
3y
9
13
$100/$50 rewards for good references
Stuart_Armstrong
1y
5
12
Model-based RL, Desires, Brains, Wireheading
Steven Byrnes
1y
1