Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

12 posts Wireheading

11 posts Reward Functions

7 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

26 generalized wireheading

carado

1mo

7

20 Reinforcement Learner Wireheading

Nate Showell

5mo

2

32 The Stamp Collector

So8res

7y

14

49 You cannot be mistaken about (not) wanting to wirehead

Kaj_Sotala

12y

79

17 Wireheading as a potential problem with the new impact measure

Stuart_Armstrong

4y

20

18 Wireheading is in the eye of the beholder

Stuart_Armstrong

3y

10

176 Are wireheads happy?

Scott Alexander

12y

107

20 Wireheading and discontinuity

Michele Campolo

2y

4

38 A definition of wireheading

Anja

10y

80

14 Wireheading Done Right: Stay Positive Without Going Insane

9eB1

6y

2

15 Defining AI wireheading

Stuart_Armstrong

3y

9

75 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

8 Reward IS the Optimization Target

Carn

2mo

3

7 Reward function learning: the value function

Stuart_Armstrong

4y

0

45 A Short Dialogue on the Meaning of Reward Functions

Leon Lang

1mo

0

9 Why we want unbiased learning processes

Stuart_Armstrong

4y

3

24 Thoughts on reward engineering

paulfchristiano

3y

30

43 Draft papers for REALab and Decoupled Approval on tampering

Jonathan Uesato

2y

2

19 An investigation into when agents may be incentivized to manipulate our beliefs.

Felix Hofstätter

3mo

0

13 $100/$50 rewards for good references

Stuart_Armstrong

1y

5

3 Reward function learning: the learning process

Stuart_Armstrong

4y

11

47 Reward model hacking as a challenge for reward learning

Erik Jenner

8mo

1