Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

1913 posts AI World Modeling Inner Alignment Rationality Interpretability (ML & AI) AI Timelines Decision Theory GPT Research Agendas Abstraction Value Learning Impact Regularization

855 posts Logical Induction Threat Models Goodhart's Law Practice & Philosophy of Science Logical Uncertainty Intellectual Progress (Society-Level) Radical Probabilism Epistemology Ethics & Morality Software Tools Fiction Bayes' Theorem

981 Where I agree and disagree with Eliezer

paulfchristiano

6mo

205

759 Simulators

janus

3mo

103

503 (My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen

3mo

83

494 chinchilla's wild implications

nostalgebraist

4mo

114

486 What 2026 looks like

Daniel Kokotajlo

1y

98

422 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

410 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

409 Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger

1y

257

381 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra

5mo

89

334 EfficientZero: How It Works

1a3orn

1y

42

324 The Parable of Predict-O-Matic

abramdemski

3y

42

315 Two-year update on my personal AI timelines

Ajeya Cotra

4mo

60

307 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

297 Why Agent Foundations? An Overly Abstract Explanation

johnswentworth

9mo

54

986 AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

6mo

653

517 It Looks Like You're Trying To Take Over The World

gwern

9mo

125

429 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

416 What failure looks like

paulfchristiano

3y

49

413 How To Get Into Independent Research On Alignment/Agency

johnswentworth

1y

33

305 Alignment Research Field Guide

abramdemski

3y

9

292 A central AI alignment problem: capabilities generalization, and the sharp left turn

So8res

6mo

48

284 Six Dimensions of Operational Adequacy in AGI Projects

Eliezer Yudkowsky

6mo

65

265 Lessons learned from talking to >100 academics about AI safety

Marius Hobbhahn

2mo

16

252 What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

Andrew_Critch

1y

60

240 Another (outer) alignment failure story

paulfchristiano

1y

38

227 Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More

Ben Pace

3y

60

221 Call For Distillers

johnswentworth

8mo

42

207 Some AI research areas and their relevance to existential safety

Andrew_Critch

2y

40