Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
6 posts
Reward Functions
10 posts
Wireheading
89
Scaling Laws for Reward Model Overoptimization
leogao
2mo
11
72
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
44
Reward model hacking as a challenge for reward learning
Erik Jenner
8mo
1
43
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
1mo
0
23
The reward engineering problem
paulfchristiano
3y
3
13
$100/$50 rewards for good references
Stuart_Armstrong
1y
5
61
Towards deconfusing wireheading and reward maximization
leogao
3mo
7
42
Four usages of "loss" in AI
TurnTrout
2mo
18
41
Draft papers for REALab and Decoupled Approval on tampering
Jonathan Uesato
2y
2
19
Wireheading and discontinuity
Michele Campolo
2y
4
17
Wireheading is in the eye of the beholder
Stuart_Armstrong
3y
10
16
Wireheading as a potential problem with the new impact measure
Stuart_Armstrong
4y
20
15
Defining AI wireheading
Stuart_Armstrong
3y
9
12
Model-based RL, Desires, Brains, Wireheading
Steven Byrnes
1y
1
9
Value extrapolation vs Wireheading
Stuart_Armstrong
6mo
1
7
Note on algorithms with multiple trained components
Steven Byrnes
7h
1