Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

41 posts Reinforcement Learning AI Capabilities Wireheading Reward Functions EfficientZero Tradeoffs

33 posts Embedded Agency Subagents Robust Agents Category Theory Spurious Counterfactuals Memetics Autonomous Vehicles

233 Reward is not the optimization target

TurnTrout

4mo

97

42 Four usages of "loss" in AI

TurnTrout

2mo

18

81 Evaluations project @ ARC is hiring a researcher and a webdev/engineer

Beth Barnes

3mo

7

80 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

83 Scaling Laws for Reward Model Overoptimization

leogao

2mo

11

38 It matters when the first sharp left turn happens

Adam Jermyn

2mo

9

77 Towards deconfusing wireheading and reward maximization

leogao

3mo

7

38 Conditioning, Prompts, and Fine-Tuning

Adam Jermyn

4mo

9

29 The reward engineering problem

paulfchristiano

3y

3

6 Some work on connecting UDT and Reinforcement Learning

IAFF-User-111

7y

0

6 Modeling the capabilities of advanced AI systems as episodic reinforcement learning

jessicata

6y

0

2 Vector-Valued Reinforcement Learning

orthonormal

6y

0

0 Reward/value learning for reinforcement learning

Stuart_Armstrong

5y

0

1 Delegative Reinforcement Learning with a Merely Sane Advisor

Vanessa Kosoy

5y

2

37 Gradations of Agency

Daniel Kokotajlo

7mo

6

134 Why Subagents?

johnswentworth

3y

42

259 Humans are very reliable agents

alyssavance

6mo

35

37 Committing, Assuming, Externalizing, and Internalizing

Scott Garrabrant

2y

25

39 Eight Definitions of Observability

Scott Garrabrant

2y

26

42 What if memes are common in highly capable minds?

Daniel Kokotajlo

2y

15

93 Updates and additions to "Embedded Agency"

Rob Bensinger

2y

1

155 Introduction to Cartesian Frames

Scott Garrabrant

2y

29

93 Subsystem Alignment

abramdemski

4y

12

62 Functors and Coarse Worlds

Scott Garrabrant

2y

4

20 You Only Get One Shot: an Intuition Pump for Embedded Agency

Oliver Sourbut

6mo

4

59 Time in Cartesian Frames

Scott Garrabrant

2y

16

38 Logical Updatelessness as a Robust Delegation Problem

Scott Garrabrant

5y

2

45 Sub-Sums and Sub-Tensors

Scott Garrabrant

2y

4