Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

12 posts Wireheading

11 posts Reward Functions

158 Are wireheads happy?

Scott Alexander

12y

107

66 A definition of wireheading

Anja

10y

80

58 The Stamp Collector

So8res

7y

14

39 You cannot be mistaken about (not) wanting to wirehead

Kaj_Sotala

12y

79

34 Wireheading is in the eye of the beholder

Stuart_Armstrong

3y

10

33 Wireheading as a potential problem with the new impact measure

Stuart_Armstrong

4y

20

29 Defining AI wireheading

Stuart_Armstrong

3y

9

22 Wireheading and discontinuity

Michele Campolo

2y

4

16 generalized wireheading

carado

1mo

7

13 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

0 Wireheading Done Right: Stay Positive Without Going Insane

9eB1

6y

2

-4 Reinforcement Learner Wireheading

Nate Showell

5mo

2

77 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

51 Draft papers for REALab and Decoupled Approval on tampering

Jonathan Uesato

2y

2

36 Thoughts on reward engineering

paulfchristiano

3y

30

35 A Short Dialogue on the Meaning of Reward Functions

Leon Lang

1mo

0

27 $100/$50 rewards for good references

Stuart_Armstrong

1y

5

17 Why we want unbiased learning processes

Stuart_Armstrong

4y

3

11 Reward function learning: the value function

Stuart_Armstrong

4y

0

11 An investigation into when agents may be incentivized to manipulate our beliefs.

Felix Hofstätter

3mo

0

9 Reward function learning: the learning process

Stuart_Armstrong

4y

11

3 Reward model hacking as a challenge for reward learning

Erik Jenner

8mo

1

-10 Reward IS the Optimization Target

Carn

2mo

3