Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

6 posts Reward Functions

10 posts Wireheading

83 Scaling Laws for Reward Model Overoptimization

leogao

2mo

11

37 A Short Dialogue on the Meaning of Reward Functions

Leon Lang

1mo

0

80 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

27 $100/$50 rewards for good references

Stuart_Armstrong

1y

5

6 Reward model hacking as a challenge for reward learning

Erik Jenner

8mo

1

29 The reward engineering problem

paulfchristiano

3y

3

13 Note on algorithms with multiple trained components

Steven Byrnes

7h

1

77 Towards deconfusing wireheading and reward maximization

leogao

3mo

7

42 Four usages of "loss" in AI

TurnTrout

2mo

18

23 Value extrapolation vs Wireheading

Stuart_Armstrong

6mo

1

53 Draft papers for REALab and Decoupled Approval on tampering

Jonathan Uesato

2y

2

22 Model-based RL, Desires, Brains, Wireheading

Steven Byrnes

1y

1

29 Defining AI wireheading

Stuart_Armstrong

3y

9

35 Wireheading is in the eye of the beholder

Stuart_Armstrong

3y

10

23 Wireheading and discontinuity

Michele Campolo

2y

4

34 Wireheading as a potential problem with the new impact measure

Stuart_Armstrong

4y

20