Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

4148 posts AI AI Risk GPT AI Timelines Machine Learning (ML) Anthropics AI Takeoff Interpretability (ML & AI) Existential Risk Inner Alignment Neuroscience Goodhart's Law

14574 posts Decision Theory Utility Functions Embedded Agency Value Learning Suffering Counterfactuals Nutrition Animal Welfare Newcomb's Problem Research Agendas VNM Theorem Risks of Astronomical Suffering (S-risks)

515 Where I agree and disagree with Eliezer

paulfchristiano

6mo

205

405 AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

6mo

653

296 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

287 What DALL-E 2 can and cannot do

Swimmer963

7mo

305

275 What should you change in response to an "emergency"? And AI risk

AnnaSalamon

5mo

60

242 Two-year update on my personal AI timelines

Ajeya Cotra

4mo

60

230 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

228 Visible Thoughts Project and Bounty Announcement

So8res

1y

104

218 Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger

1y

257

218 DeepMind: Generally capable agents emerge from open-ended play

Daniel Kokotajlo

1y

53

217 Contra Hofstadter on GPT-3 Nonsense

rictic

6mo

22

217 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

216 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra

5mo

89

215 A Quick Guide to Confronting Doom

Ruby

8mo

36

281 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

249 Humans are very reliable agents

alyssavance

6mo

35

218 Reward is not the optimization target

TurnTrout

4mo

97

216 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

194 EfficientZero: How It Works

1a3orn

1y

42

177 Impossibility results for unbounded utilities

paulfchristiano

10mo

104

158 Are wireheads happy?

Scott Alexander

12y

107

147 Introduction to Cartesian Frames

Scott Garrabrant

2y

29

146 Some conceptual alignment research projects

Richard_Ngo

3mo

14

140 Saving Time

Scott Garrabrant

1y

19

139 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

138 Decision theory does not imply that we get to have nice things

So8res

2mo

53

131 Humans are utility monsters

PhilGoetz

9y

217

129 Embedded Agents

abramdemski

4y

41