Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

41 posts Reinforcement Learning AI Capabilities Wireheading Reward Functions EfficientZero Tradeoffs

33 posts Embedded Agency Subagents Robust Agents Category Theory Spurious Counterfactuals Memetics Autonomous Vehicles

233 Reward is not the optimization target

TurnTrout

4mo

97

212 EfficientZero: How It Works

1a3orn

1y

42

144 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

108 We have achieved Noob Gains in AI

phdead

7mo

21

98 The alignment problem in different capability regimes

Buck

1y

12

87 Jitters No Evidence of Stupidity in RL

1a3orn

1y

18

83 Scaling Laws for Reward Model Overoptimization

leogao

2mo

11

81 Evaluations project @ ARC is hiring a researcher and a webdev/engineer

Beth Barnes

3mo

7

80 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

77 Towards deconfusing wireheading and reward maximization

leogao

3mo

7

77 OpenAI Solves (Some) Formal Math Olympiad Problems

Michaël Trazzi

10mo

26

70 Misc. questions about EfficientZero

Daniel Kokotajlo

1y

17

66 My take on Michael Littman on "The HCI of HAI"

Alex Flint

1y

4

53 Draft papers for REALab and Decoupled Approval on tampering

Jonathan Uesato

2y

2

259 Humans are very reliable agents

alyssavance

6mo

35

155 Introduction to Cartesian Frames

Scott Garrabrant

2y

29

134 Why Subagents?

johnswentworth

3y

42

109 Robust Delegation

abramdemski

4y

10

103 Reward Is Not Enough

Steven Byrnes

1y

18

95 Embedded Agency (full-text version)

Scott Garrabrant

4y

15

93 Updates and additions to "Embedded Agency"

Rob Bensinger

2y

1

93 Subsystem Alignment

abramdemski

4y

12

93 Humans Are Embedded Agents Too

johnswentworth

2y

19

80 Embedded Curiosities

Scott Garrabrant

4y

1

75 Embedded World-Models

abramdemski

4y

16

70 Additive Operations on Cartesian Frames

Scott Garrabrant

2y

6

62 Functors and Coarse Worlds

Scott Garrabrant

2y

4

62 (A -> B) -> A

Scott Garrabrant

4y

11