Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
6 posts
Reward Functions
10 posts
Wireheading
83
Scaling Laws for Reward Model Overoptimization
leogao
2mo
11
37
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
1mo
0
80
Seriously, what goes wrong with "reward the agent when it makes you smile"?
TurnTrout
4mo
41
27
$100/$50 rewards for good references
Stuart_Armstrong
1y
5
6
Reward model hacking as a challenge for reward learning
Erik Jenner
8mo
1
29
The reward engineering problem
paulfchristiano
3y
3
13
Note on algorithms with multiple trained components
Steven Byrnes
7h
1
77
Towards deconfusing wireheading and reward maximization
leogao
3mo
7
42
Four usages of "loss" in AI
TurnTrout
2mo
18
23
Value extrapolation vs Wireheading
Stuart_Armstrong
6mo
1
53
Draft papers for REALab and Decoupled Approval on tampering
Jonathan Uesato
2y
2
22
Model-based RL, Desires, Brains, Wireheading
Steven Byrnes
1y
1
29
Defining AI wireheading
Stuart_Armstrong
3y
9
35
Wireheading is in the eye of the beholder
Stuart_Armstrong
3y
10
23
Wireheading and discontinuity
Michele Campolo
2y
4
34
Wireheading as a potential problem with the new impact measure
Stuart_Armstrong
4y
20