Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

4148 posts AI AI Risk GPT AI Timelines Machine Learning (ML) Anthropics AI Takeoff Interpretability (ML & AI) Existential Risk Inner Alignment Neuroscience Goodhart's Law

14574 posts Decision Theory Utility Functions Embedded Agency Value Learning Suffering Counterfactuals Nutrition Animal Welfare Newcomb's Problem Research Agendas VNM Theorem Risks of Astronomical Suffering (S-risks)

777 Where I agree and disagree with Eliezer

paulfchristiano

6mo

205

724 AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

6mo

653

472 Simulators

janus

3mo

103

364 chinchilla's wild implications

nostalgebraist

4mo

114

364 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

351 What DALL-E 2 can and cannot do

Swimmer963

7mo

305

344 (My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen

3mo

83

338 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

336 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

325 Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger

1y

257

319 What failure looks like

paulfchristiano

3y

49

314 How To Get Into Independent Research On Alignment/Agency

johnswentworth

1y

33

310 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra

5mo

89

303 What should you change in response to an "emergency"? And AI risk

AnnaSalamon

5mo

60

276 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

273 EfficientZero: How It Works

1a3orn

1y

42

258 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

252 Reward is not the optimization target

TurnTrout

4mo

97

248 Humans are very reliable agents

alyssavance

6mo

35

198 Embedded Agents

abramdemski

4y

41

168 Some conceptual alignment research projects

Richard_Ngo

3mo

14

167 Are wireheads happy?

Scott Alexander

12y

107

157 Impossibility results for unbounded utilities

paulfchristiano

10mo

104

147 Can you control the past?

Joe Carlsmith

1y

93

145 Introduction to Cartesian Frames

Scott Garrabrant

2y

29

143 Embedded Agency (full-text version)

Scott Garrabrant

4y

15

142 Decision theory does not imply that we get to have nice things

So8res

2mo

53

139 Coherent decisions imply consistent utilities

Eliezer Yudkowsky

3y

81