Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

50 posts Reinforcement Learning Inverse Reinforcement Learning Road To AI Safety Excellence

23 posts Wireheading Reward Functions

252 Reward is not the optimization target

TurnTrout

4mo

97

8 AGIs may value intrinsic rewards more than extrinsic ones

catubc

1mo

6

16 A Survey of Foundational Methods in Inverse Reinforcement Learning

adamk

3mo

0

82 Jitters No Evidence of Stupidity in RL

1a3orn

1y

18

25 Is CIRL a promising agenda?

Chris_Leong

6mo

12

59 My take on Michael Littman on "The HCI of HAI"

Alex Flint

1y

4

16 RLHF

Ansh Radhakrishnan

7mo

5

77 Book Review: Human Compatible

Scott Alexander

2y

6

63 Thoughts on "Human-Compatible"

TurnTrout

3y

35

15 Scalar reward is not enough for aligned AGI

Peter Vamplew

11mo

3

67 RAISE is launching their MVP

3y

1

37 Book review: Human Compatible

PeterMcCluskey

2y

2

41 Learning biases and rewards simultaneously

Rohin Shah

3y

3

33 Model Mis-specification and Inverse Reinforcement Learning

Owain_Evans

4y

3

10 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

40 A Short Dialogue on the Meaning of Reward Functions

Leon Lang

1mo

0

21 generalized wireheading

carado

1mo

7

76 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

15 An investigation into when agents may be incentivized to manipulate our beliefs.

Felix Hofstätter

3mo

0

25 Reward model hacking as a challenge for reward learning

Erik Jenner

8mo

1

8 Reinforcement Learner Wireheading

Nate Showell

5mo

2

47 Draft papers for REALab and Decoupled Approval on tampering

Jonathan Uesato

2y

2

20 $100/$50 rewards for good references

Stuart_Armstrong

1y

5

167 Are wireheads happy?

Scott Alexander

12y

107

21 Wireheading and discontinuity

Michele Campolo

2y

4

30 Thoughts on reward engineering

paulfchristiano

3y

30

22 Defining AI wireheading

Stuart_Armstrong

3y

9

26 Wireheading is in the eye of the beholder

Stuart_Armstrong

3y

10