Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
14 posts
Reinforcement Learning
16 posts
Wireheading
Reward Functions
233
Reward is not the optimization target
TurnTrout
4mo
97
87
Jitters No Evidence of Stupidity in RL
1a3orn
1y
18
66
My take on Michael Littman on "The HCI of HAI"
Alex Flint
1y
4
38
Conditioning, Prompts, and Fine-Tuning
Adam Jermyn
4mo
9
38
Big picture of phasic dopamine
Steven Byrnes
1y
18
27
Reinforcement learning with imperceptible rewards
Vanessa Kosoy
3y
1
21
A model of decision-making in the brain (the short version)
Steven Byrnes
1y
0
6
Some work on connecting UDT and Reinforcement Learning
IAFF-User-111
7y
0
6
Modeling the capabilities of advanced AI systems as episodic reinforcement learning
jessicata
6y
0
2
Vector-Valued Reinforcement Learning
orthonormal
6y
0
2
Scalar reward is not enough for aligned AGI
Peter Vamplew
11mo
3
1
Delegative Reinforcement Learning with a Merely Sane Advisor
Vanessa Kosoy
5y
2
0
Reward/value learning for reinforcement learning
Stuart_Armstrong
5y
0
-1
Multi-Agent Inverse Reinforcement Learning: Suboptimal Demonstrations and Alternative Solution Concepts
sage_bergerson
1y
0
83
Scaling Laws for Reward Model Overoptimization
leogao
2mo
11
80
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
77
Towards deconfusing wireheading and reward maximization
leogao
3mo
7
53
Draft papers for REALab and Decoupled Approval on tampering
Jonathan Uesato
2y
2
42
Four usages of "loss" in AI
TurnTrout
2mo
18
37
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
1mo
0
35
Wireheading is in the eye of the beholder
Stuart_Armstrong
3y
10
34
Wireheading as a potential problem with the new impact measure
Stuart_Armstrong
4y
20
29
The reward engineering problem
paulfchristiano
3y
3
29
Defining AI wireheading
Stuart_Armstrong
3y
9
27
$100/$50 rewards for good references
Stuart_Armstrong
1y
5
23
Wireheading and discontinuity
Michele Campolo
2y
4
23
Value extrapolation vs Wireheading
Stuart_Armstrong
6mo
1
22
Model-based RL, Desires, Brains, Wireheading
Steven Byrnes
1y
1