Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

73 posts Reinforcement Learning Inverse Reinforcement Learning Wireheading Reward Functions Road To AI Safety Excellence

28 posts AI Capabilities Definitions Stag Hunt Goals Prompt Engineering PaLM EfficientZero

286 Reward is not the optimization target

TurnTrout

4mo

97

176 Are wireheads happy?

Scott Alexander

12y

107

80 Jitters No Evidence of Stupidity in RL

1a3orn

1y

18

75 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

71 RAISE is launching their MVP

3y

1

70 Thoughts on "Human-Compatible"

TurnTrout

3y

35

62 Book Review: Human Compatible

Scott Alexander

2y

6

55 My take on Michael Littman on "The HCI of HAI"

Alex Flint

1y

4

49 You cannot be mistaken about (not) wanting to wirehead

Kaj_Sotala

12y

79

47 Reward model hacking as a challenge for reward learning

Erik Jenner

8mo

1

45 A Short Dialogue on the Meaning of Reward Functions

Leon Lang

1mo

0

43 Draft papers for REALab and Decoupled Approval on tampering

Jonathan Uesato

2y

2

38 A definition of wireheading

Anja

10y

80

37 AI Safety Prerequisites Course: Basic abstract representations of computation

RAISE

3y

2

352 EfficientZero: How It Works

1a3orn

1y

42

271 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

129 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

109 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

91 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

65 Competitive programming with AlphaCode

Algon

10mo

37

45 The Problem With The Current State of AGI Definitions

Yitz

6mo

22

45 Remaking EfficientZero (as best I can)

Hoagy

5mo

9

35 What's the Most Impressive Thing That GPT-4 Could Plausibly Do?

bayesed

3mo

24

33 Misc. questions about EfficientZero

Daniel Kokotajlo

1y

17

31 Note on Terminology: "Rationality", not "Rationalism"

Vladimir_Nesov

11y

51

30 Do Humans Want Things?

lukeprog

11y

53

20 How might we make better use of AI capabilities research for alignment purposes?

ghostwheel

3mo

4

18 Compact vs. Wide Models

Vaniver

4y

5