Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
14 posts
Reinforcement Learning
16 posts
Wireheading
Reward Functions
233
Reward is not the optimization target
TurnTrout
4mo
97
38
Conditioning, Prompts, and Fine-Tuning
Adam Jermyn
4mo
9
87
Jitters No Evidence of Stupidity in RL
1a3orn
1y
18
66
My take on Michael Littman on "The HCI of HAI"
Alex Flint
1y
4
38
Big picture of phasic dopamine
Steven Byrnes
1y
18
21
A model of decision-making in the brain (the short version)
Steven Byrnes
1y
0
27
Reinforcement learning with imperceptible rewards
Vanessa Kosoy
3y
1
2
Scalar reward is not enough for aligned AGI
Peter Vamplew
11mo
3
6
Modeling the capabilities of advanced AI systems as episodic reinforcement learning
jessicata
6y
0
6
Some work on connecting UDT and Reinforcement Learning
IAFF-User-111
7y
0
2
Vector-Valued Reinforcement Learning
orthonormal
6y
0
1
Delegative Reinforcement Learning with a Merely Sane Advisor
Vanessa Kosoy
5y
2
0
Reward/value learning for reinforcement learning
Stuart_Armstrong
5y
0
-1
Multi-Agent Inverse Reinforcement Learning: Suboptimal Demonstrations and Alternative Solution Concepts
sage_bergerson
1y
0
13
Note on algorithms with multiple trained components
Steven Byrnes
7h
1
83
Scaling Laws for Reward Model Overoptimization
leogao
2mo
11
37
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
1mo
0
77
Towards deconfusing wireheading and reward maximization
leogao
3mo
7
80
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
42
Four usages of "loss" in AI
TurnTrout
2mo
18
23
Value extrapolation vs Wireheading
Stuart_Armstrong
6mo
1
27
$100/$50 rewards for good references
Stuart_Armstrong
1y
5
53
Draft papers for REALab and Decoupled Approval on tampering
Jonathan Uesato
2y
2
22
Model-based RL, Desires, Brains, Wireheading
Steven Byrnes
1y
1
6
Reward model hacking as a challenge for reward learning
Erik Jenner
8mo
1
29
Defining AI wireheading
Stuart_Armstrong
3y
9
35
Wireheading is in the eye of the beholder
Stuart_Armstrong
3y
10
23
Wireheading and discontinuity
Michele Campolo
2y
4