Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

41 posts Reinforcement Learning AI Capabilities Wireheading Reward Functions EfficientZero Tradeoffs

33 posts Embedded Agency Subagents Robust Agents Category Theory Spurious Counterfactuals Memetics Autonomous Vehicles

7 Note on algorithms with multiple trained components

Steven Byrnes

7h

1

104 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

271 Reward is not the optimization target

TurnTrout

4mo

97

43 A Short Dialogue on the Meaning of Reward Functions

Leon Lang

1mo

0

89 Scaling Laws for Reward Model Overoptimization

leogao

2mo

11

107 Evaluations project @ ARC is hiring a researcher and a webdev/engineer

Beth Barnes

3mo

7

334 EfficientZero: How It Works

1a3orn

1y

42

61 Towards deconfusing wireheading and reward maximization

leogao

3mo

7

42 Four usages of "loss" in AI

TurnTrout

2mo

18

72 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

120 We have achieved Noob Gains in AI

phdead

7mo

21

32 It matters when the first sharp left turn happens

Adam Jermyn

2mo

9

124 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

43 Remaking EfficientZero (as best I can)

Hoagy

5mo

9

237 Humans are very reliable agents

alyssavance

6mo

35

43 Gradations of Agency

Daniel Kokotajlo

7mo

6

107 Reward Is Not Enough

Steven Byrnes

1y

18

135 Introduction to Cartesian Frames

Scott Garrabrant

2y

29

24 You Only Get One Shot: an Intuition Pump for Embedded Agency

Oliver Sourbut

6mo

4

188 Why Subagents?

johnswentworth

3y

42

191 Embedded Agency (full-text version)

Scott Garrabrant

4y

15

111 Robust Delegation

abramdemski

4y

10

52 Additive Operations on Cartesian Frames

Scott Garrabrant

2y

6

107 Subsystem Alignment

abramdemski

4y

12

53 Updates and additions to "Embedded Agency"

Rob Bensinger

2y

1

47 Subagents of Cartesian Frames

Scott Garrabrant

2y

5

99 Embedded World-Models

abramdemski

4y

16

96 Embedded Curiosities

Scott Garrabrant

4y

1