Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

180 posts Research Agendas Embedded Agency Suffering Agency Animal Welfare Risks of Astronomical Suffering (S-risks) Robust Agents Cause Prioritization Center on Long-Term Risk (CLR) 80,000 Hours Crucial Considerations Veg*nism

164 posts Value Learning Reinforcement Learning AI Capabilities Inverse Reinforcement Learning Wireheading Definitions Reward Functions The Pointers Problem Stag Hunt Road To AI Safety Excellence Goals EfficientZero

258 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

248 Humans are very reliable agents

alyssavance

6mo

35

198 Embedded Agents

abramdemski

4y

41

168 Some conceptual alignment research projects

Richard_Ngo

3mo

14

145 Introduction to Cartesian Frames

Scott Garrabrant

2y

29

143 Embedded Agency (full-text version)

Scott Garrabrant

4y

15

131 Demand offsetting

paulfchristiano

1y

38

130 Being a Robust Agent

Raemon

4y

32

112 Our take on CHAI’s research agenda in under 1500 words

Alex Flint

2y

19

110 Robust Delegation

abramdemski

4y

10

103 The Power of Agency

lukeprog

11y

78

100 Subsystem Alignment

abramdemski

4y

12

93 Announcement: AI alignment prize round 3 winners and next round

cousin_it

4y

7

88 "Just Suffer Until It Passes"

lionhearted

4y

26

276 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

273 EfficientZero: How It Works

1a3orn

1y

42

252 Reward is not the optimization target

TurnTrout

4mo

97

167 Are wireheads happy?

Scott Alexander

12y

107

134 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

104 The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

2y

43

82 Jitters No Evidence of Stupidity in RL

1a3orn

1y

18

81 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

77 Book Review: Human Compatible

Scott Alexander

2y

6

76 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

76 The Urgent Meta-Ethics of Friendly Artificial Intelligence

lukeprog

11y

252

74 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

69 The E-Coli Test for AI Alignment

johnswentworth

4y

24

68 Preface to the sequence on value learning

Rohin Shah

4y

6