Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

30 posts Reinforcement Learning Wireheading Reward Functions

11 posts AI Capabilities EfficientZero Tradeoffs

233 Reward is not the optimization target

TurnTrout

4mo

97

42 Four usages of "loss" in AI

TurnTrout

2mo

18

80 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

83 Scaling Laws for Reward Model Overoptimization

leogao

2mo

11

77 Towards deconfusing wireheading and reward maximization

leogao

3mo

7

38 Conditioning, Prompts, and Fine-Tuning

Adam Jermyn

4mo

9

29 The reward engineering problem

paulfchristiano

3y

3

6 Some work on connecting UDT and Reinforcement Learning

IAFF-User-111

7y

0

6 Modeling the capabilities of advanced AI systems as episodic reinforcement learning

jessicata

6y

0

2 Vector-Valued Reinforcement Learning

orthonormal

6y

0

0 Reward/value learning for reinforcement learning

Stuart_Armstrong

5y

0

1 Delegative Reinforcement Learning with a Merely Sane Advisor

Vanessa Kosoy

5y

2

34 Wireheading as a potential problem with the new impact measure

Stuart_Armstrong

4y

20

35 Wireheading is in the eye of the beholder

Stuart_Armstrong

3y

10

81 Evaluations project @ ARC is hiring a researcher and a webdev/engineer

Beth Barnes

3mo

7

38 It matters when the first sharp left turn happens

Adam Jermyn

2mo

9

44 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

98 The alignment problem in different capability regimes

Buck

1y

12

108 We have achieved Noob Gains in AI

phdead

7mo

21

25 Remaking EfficientZero (as best I can)

Hoagy

5mo

9

8 Epistemic Strategies of Safety-Capabilities Tradeoffs

adamShimi

1y

0

144 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

212 EfficientZero: How It Works

1a3orn

1y

42

77 OpenAI Solves (Some) Formal Math Olympiad Problems

Michaël Trazzi

10mo

26

70 Misc. questions about EfficientZero

Daniel Kokotajlo

1y

17