Go Back
You can't go any further
Choose this branch
meritocratic
regular
democratic
hot
top
alive
14 posts
Reinforcement Learning
16 posts
Wireheading
Reward Functions
252
Reward is not the optimization target
TurnTrout
4mo
97
32
Conditioning, Prompts, and Fine-Tuning
Adam Jermyn
4mo
9
4
Some work on connecting UDT and Reinforcement Learning
IAFF-User-111
7y
0
4
Modeling the capabilities of advanced AI systems as episodic reinforcement learning
jessicata
6y
0
2
Vector-Valued Reinforcement Learning
orthonormal
6y
0
0
Reward/value learning for reinforcement learning
Stuart_Armstrong
5y
0
1
Delegative Reinforcement Learning with a Merely Sane Advisor
Vanessa Kosoy
5y
2
82
Jitters No Evidence of Stupidity in RL
1a3orn
1y
18
26
Reinforcement learning with imperceptible rewards
Vanessa Kosoy
3y
1
20
A model of decision-making in the brain (the short version)
Steven Byrnes
1y
0
15
Scalar reward is not enough for aligned AGI
Peter Vamplew
11mo
3
59
Big picture of phasic dopamine
Steven Byrnes
1y
18
5
Multi-Agent Inverse Reinforcement Learning: Suboptimal Demonstrations and Alternative Solution Concepts
sage_bergerson
1y
0
59
My take on Michael Littman on "The HCI of HAI"
Alex Flint
1y
4
42
Four usages of "loss" in AI
TurnTrout
2mo
18
76
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
86
Scaling Laws for Reward Model Overoptimization
leogao
2mo
11
69
Towards deconfusing wireheading and reward maximization
leogao
3mo
7
26
The reward engineering problem
paulfchristiano
3y
3
25
Wireheading as a potential problem with the new impact measure
Stuart_Armstrong
4y
20
26
Wireheading is in the eye of the beholder
Stuart_Armstrong
3y
10
40
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
1mo
0
17
Model-based RL, Desires, Brains, Wireheading
Steven Byrnes
1y
1
21
Wireheading and discontinuity
Michele Campolo
2y
4
47
Draft papers for REALab and Decoupled Approval on tampering
Jonathan Uesato
2y
2
20
$100/$50 rewards for good references
Stuart_Armstrong
1y
5
10
Note on algorithms with multiple trained components
Steven Byrnes
7h
1
16
Value extrapolation vs Wireheading
Stuart_Armstrong
6mo
1