Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

101 posts Reinforcement Learning AI Capabilities Inverse Reinforcement Learning Wireheading Definitions Reward Functions Stag Hunt Road To AI Safety Excellence Goals Prompt Engineering EfficientZero PaLM

63 posts Value Learning The Pointers Problem

352 EfficientZero: How It Works

1a3orn

1y

42

286 Reward is not the optimization target

TurnTrout

4mo

97

271 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

176 Are wireheads happy?

Scott Alexander

12y

107

129 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

109 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

91 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

80 Jitters No Evidence of Stupidity in RL

1a3orn

1y

18

75 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

71 RAISE is launching their MVP

3y

1

70 Thoughts on "Human-Compatible"

TurnTrout

3y

35

65 Competitive programming with AlphaCode

Algon

10mo

37

62 Book Review: Human Compatible

Scott Alexander

2y

6

55 My take on Michael Littman on "The HCI of HAI"

Alex Flint

1y

4

115 The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

2y

43

80 The E-Coli Test for AI Alignment

johnswentworth

4y

24

77 Preface to the sequence on value learning

Rohin Shah

4y

6

66 Why we need a *theory* of human values

Stuart_Armstrong

4y

15

61 Humans can be assigned any values whatsoever…

Stuart_Armstrong

4y

26

60 Clarifying "AI Alignment"

paulfchristiano

4y

82

55 The Urgent Meta-Ethics of Friendly Artificial Intelligence

lukeprog

11y

252

53 Intuitions about goal-directed behavior

Rohin Shah

4y

15

52 AI Alignment Problem: “Human Values” don’t Actually Exist

avturchin

3y

29

50 Future directions for ambitious value learning

Rohin Shah

4y

9

50 The easy goal inference problem is still hard

paulfchristiano

4y

19

47 Using vector fields to visualise preferences and make them consistent

MichaelA

2y

32

46 What is ambitious value learning?

Rohin Shah

4y

28

44 Conclusion to the sequence on value learning

Rohin Shah

3y

20