Tree of Tags

Go Back

You can't go any further

Choose this branch

meritocratic regular democratic

hot top alive

14 posts Reinforcement Learning

16 posts Wireheading Reward Functions

271 Reward is not the optimization target

TurnTrout

4mo

97

80 Big picture of phasic dopamine

Steven Byrnes

1y

18

77 Jitters No Evidence of Stupidity in RL

1a3orn

1y

18

52 My take on Michael Littman on "The HCI of HAI"

Alex Flint

1y

4

28 Scalar reward is not enough for aligned AGI

Peter Vamplew

11mo

3

26 Conditioning, Prompts, and Fine-Tuning

Adam Jermyn

4mo

9

25 Reinforcement learning with imperceptible rewards

Vanessa Kosoy

3y

1

19 A model of decision-making in the brain (the short version)

Steven Byrnes

1y

0

11 Multi-Agent Inverse Reinforcement Learning: Suboptimal Demonstrations and Alternative Solution Concepts

sage_bergerson

1y

0

2 Some work on connecting UDT and Reinforcement Learning

IAFF-User-111

7y

0

2 Modeling the capabilities of advanced AI systems as episodic reinforcement learning

jessicata

6y

0

2 Vector-Valued Reinforcement Learning

orthonormal

6y

0

1 Delegative Reinforcement Learning with a Merely Sane Advisor

Vanessa Kosoy

5y

2

0 Reward/value learning for reinforcement learning

Stuart_Armstrong

5y

0

89 Scaling Laws for Reward Model Overoptimization

leogao

2mo

11

72 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

61 Towards deconfusing wireheading and reward maximization

leogao

3mo

7

44 Reward model hacking as a challenge for reward learning

Erik Jenner

8mo

1

43 A Short Dialogue on the Meaning of Reward Functions

Leon Lang

1mo

0

42 Four usages of "loss" in AI

TurnTrout

2mo

18

41 Draft papers for REALab and Decoupled Approval on tampering

Jonathan Uesato

2y

2

23 The reward engineering problem

paulfchristiano

3y

3

19 Wireheading and discontinuity

Michele Campolo

2y

4

17 Wireheading is in the eye of the beholder

Stuart_Armstrong

3y

10

16 Wireheading as a potential problem with the new impact measure

Stuart_Armstrong

4y

20

15 Defining AI wireheading

Stuart_Armstrong

3y

9

13 $100/$50 rewards for good references

Stuart_Armstrong

1y

5

12 Model-based RL, Desires, Brains, Wireheading

Steven Byrnes

1y

1