Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

4148 posts AI AI Risk GPT AI Timelines Machine Learning (ML) Anthropics AI Takeoff Interpretability (ML & AI) Existential Risk Inner Alignment Neuroscience Goodhart's Law

14574 posts Decision Theory Utility Functions Embedded Agency Value Learning Suffering Counterfactuals Nutrition Animal Welfare Newcomb's Problem Research Agendas VNM Theorem Risks of Astronomical Suffering (S-risks)

1043 AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

6mo

653

1039 Where I agree and disagree with Eliezer

paulfchristiano

6mo

205

808 Simulators

janus

3mo

103

531 (My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen

3mo

83

521 chinchilla's wild implications

nostalgebraist

4mo

114

455 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

446 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

437 What failure looks like

paulfchristiano

3y

49

436 How To Get Into Independent Research On Alignment/Agency

johnswentworth

1y

33

432 Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger

1y

257

432 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

415 What DALL-E 2 can and cannot do

Swimmer963

7mo

305

404 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra

5mo

89

394 Why I think strong general AI is coming soon

porby

2mo

126

352 EfficientZero: How It Works

1a3orn

1y

42

300 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

286 Reward is not the optimization target

TurnTrout

4mo

97

271 Is AI Progress Impossible To Predict?

alyssavance

7mo

38

267 Embedded Agents

abramdemski

4y

41

247 Humans are very reliable agents

alyssavance

6mo

35

202 Embedded Agency (full-text version)

Scott Garrabrant

4y

15

190 Some conceptual alignment research projects

Richard_Ngo

3mo

14

176 Are wireheads happy?

Scott Alexander

12y

107

170 Can you control the past?

Joe Carlsmith

1y

93

161 Coherent decisions imply consistent utilities

Eliezer Yudkowsky

3y

81

158 Being a Robust Agent

Raemon

4y

32

155 why assume AGIs will optimize for fixed goals?

nostalgebraist

6mo

52

146 Newcomb's Problem and Regret of Rationality

Eliezer Yudkowsky

14y

614