Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

180 posts Research Agendas Embedded Agency Suffering Agency Animal Welfare Risks of Astronomical Suffering (S-risks) Robust Agents Cause Prioritization Center on Long-Term Risk (CLR) 80,000 Hours Crucial Considerations Veg*nism

164 posts Value Learning Reinforcement Learning AI Capabilities Inverse Reinforcement Learning Wireheading Definitions Reward Functions The Pointers Problem Stag Hunt Road To AI Safety Excellence Goals EfficientZero

249 Humans are very reliable agents

alyssavance

6mo

35

216 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

147 Introduction to Cartesian Frames

Scott Garrabrant

2y

29

146 Some conceptual alignment research projects

Richard_Ngo

3mo

14

129 Embedded Agents

abramdemski

4y

41

127 Demand offsetting

paulfchristiano

1y

38

117 Our take on CHAI’s research agenda in under 1500 words

Alex Flint

2y

19

111 "Just Suffer Until It Passes"

lionhearted

4y

26

105 Wirehead your Chickens

shminux

4y

53

104 Botworld: a cellular automaton for studying self-modifying agents embedded in their environment

So8res

8y

55

103 The Power of Agency

lukeprog

11y

78

102 Being a Robust Agent

Raemon

4y

32

102 Robust Delegation

abramdemski

4y

10

96 Announcement: AI alignment prize round 3 winners and next round

cousin_it

4y

7

281 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

218 Reward is not the optimization target

TurnTrout

4mo

97

194 EfficientZero: How It Works

1a3orn

1y

42

158 Are wireheads happy?

Scott Alexander

12y

107

139 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

97 The Urgent Meta-Ethics of Friendly Artificial Intelligence

lukeprog

11y

252

93 The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

2y

43

92 Book Review: Human Compatible

Scott Alexander

2y

6

84 Jitters No Evidence of Stupidity in RL

1a3orn

1y

18

77 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

74 Where do selfish values come from?

Wei_Dai

11y

62

71 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

69 Misc. questions about EfficientZero

Daniel Kokotajlo

1y

17

68 Clarifying "AI Alignment"

paulfchristiano

4y

82