Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

6 posts Reward Functions

10 posts Wireheading

43 A Short Dialogue on the Meaning of Reward Functions

Leon Lang

1mo

0

89 Scaling Laws for Reward Model Overoptimization

leogao

2mo

11

72 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

44 Reward model hacking as a challenge for reward learning

Erik Jenner

8mo

1

13 $100/$50 rewards for good references

Stuart_Armstrong

1y

5

23 The reward engineering problem

paulfchristiano

3y

3

7 Note on algorithms with multiple trained components

Steven Byrnes

7h

1

61 Towards deconfusing wireheading and reward maximization

leogao

3mo

7

42 Four usages of "loss" in AI

TurnTrout

2mo

18

9 Value extrapolation vs Wireheading

Stuart_Armstrong

6mo

1

41 Draft papers for REALab and Decoupled Approval on tampering

Jonathan Uesato

2y

2

12 Model-based RL, Desires, Brains, Wireheading

Steven Byrnes

1y

1

19 Wireheading and discontinuity

Michele Campolo

2y

4

15 Defining AI wireheading

Stuart_Armstrong

3y

9

17 Wireheading is in the eye of the beholder

Stuart_Armstrong

3y

10

16 Wireheading as a potential problem with the new impact measure

Stuart_Armstrong

4y

20