Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

180 posts Research Agendas Embedded Agency Suffering Agency Animal Welfare Risks of Astronomical Suffering (S-risks) Robust Agents Cause Prioritization Center on Long-Term Risk (CLR) 80,000 Hours Crucial Considerations Veg*nism

164 posts Value Learning Reinforcement Learning AI Capabilities Inverse Reinforcement Learning Wireheading Definitions Reward Functions The Pointers Problem Stag Hunt Road To AI Safety Excellence Goals EfficientZero

300 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

267 Embedded Agents

abramdemski

4y

41

247 Humans are very reliable agents

alyssavance

6mo

35

202 Embedded Agency (full-text version)

Scott Garrabrant

4y

15

190 Some conceptual alignment research projects

Richard_Ngo

3mo

14

158 Being a Robust Agent

Raemon

4y

32

143 Introduction to Cartesian Frames

Scott Garrabrant

2y

29

135 Demand offsetting

paulfchristiano

1y

38

124 New book on s-risks

Tobias_Baumann

1mo

1

118 Robust Delegation

abramdemski

4y

10

113 Subsystem Alignment

abramdemski

4y

12

107 Our take on CHAI’s research agenda in under 1500 words

Alex Flint

2y

19

105 Embedded World-Models

abramdemski

4y

16

103 The Power of Agency

lukeprog

11y

78

352 EfficientZero: How It Works

1a3orn

1y

42

286 Reward is not the optimization target

TurnTrout

4mo

97

271 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

176 Are wireheads happy?

Scott Alexander

12y

107

129 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

115 The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

2y

43

109 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

91 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

80 Jitters No Evidence of Stupidity in RL

1a3orn

1y

18

80 The E-Coli Test for AI Alignment

johnswentworth

4y

24

77 Preface to the sequence on value learning

Rohin Shah

4y

6

75 Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout

4mo

41

71 RAISE is launching their MVP

3y

1

70 Thoughts on "Human-Compatible"

TurnTrout

3y

35