Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
14 posts
Reinforcement Learning
16 posts
Wireheading
Reward Functions
271
Reward is not the optimization target
TurnTrout
4mo
97
26
Conditioning, Prompts, and Fine-Tuning
Adam Jermyn
4mo
9
2
Some work on connecting UDT and Reinforcement Learning
IAFF-User-111
7y
0
2
Modeling the capabilities of advanced AI systems as episodic reinforcement learning
jessicata
6y
0
2
Vector-Valued Reinforcement Learning
orthonormal
6y
0
0
Reward/value learning for reinforcement learning
Stuart_Armstrong
5y
0
1
Delegative Reinforcement Learning with a Merely Sane Advisor
Vanessa Kosoy
5y
2
77
Jitters No Evidence of Stupidity in RL
1a3orn
1y
18
25
Reinforcement learning with imperceptible rewards
Vanessa Kosoy
3y
1
19
A model of decision-making in the brain (the short version)
Steven Byrnes
1y
0
28
Scalar reward is not enough for aligned AGI
Peter Vamplew
11mo
3
80
Big picture of phasic dopamine
Steven Byrnes
1y
18
11
Multi-Agent Inverse Reinforcement Learning: Suboptimal Demonstrations and Alternative Solution Concepts
sage_bergerson
1y
0
52
My take on Michael Littman on "The HCI of HAI"
Alex Flint
1y
4
42
Four usages of "loss" in AI
TurnTrout
2mo
18
72
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
89
Scaling Laws for Reward Model Overoptimization
leogao
2mo
11
61
Towards deconfusing wireheading and reward maximization
leogao
3mo
7
23
The reward engineering problem
paulfchristiano
3y
3
16
Wireheading as a potential problem with the new impact measure
Stuart_Armstrong
4y
20
17
Wireheading is in the eye of the beholder
Stuart_Armstrong
3y
10
43
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
1mo
0
12
Model-based RL, Desires, Brains, Wireheading
Steven Byrnes
1y
1
19
Wireheading and discontinuity
Michele Campolo
2y
4
41
Draft papers for REALab and Decoupled Approval on tampering
Jonathan Uesato
2y
2
13
$100/$50 rewards for good references
Stuart_Armstrong
1y
5
7
Note on algorithms with multiple trained components
Steven Byrnes
7h
1
9
Value extrapolation vs Wireheading
Stuart_Armstrong
6mo
1