Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

1564 posts AI Inner Alignment Interpretability (ML & AI) AI Timelines GPT Research Agendas AI Takeoff Value Learning Machine Learning (ML) Conjecture (org) Mesa-Optimization Outer Alignment

349 posts Abstraction Impact Regularization Rationality World Modeling Decision Theory Human Values Goal-Directedness Anthropics Utility Functions Finite Factored Sets Shard Theory Fixed Point Theorems

759 Simulators

janus

3mo

103

503 (My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen

3mo

83

494 chinchilla's wild implications

nostalgebraist

4mo

114

486 What 2026 looks like

Daniel Kokotajlo

1y

98

422 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

410 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

409 Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger

1y

257

334 EfficientZero: How It Works

1a3orn

1y

42

324 The Parable of Predict-O-Matic

abramdemski

3y

42

315 Two-year update on my personal AI timelines

Ajeya Cotra

4mo

60

307 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

297 Why Agent Foundations? An Overly Abstract Explanation

johnswentworth

9mo

54

296 Are we in an AI overhang?

Andy Jones

2y

109

285 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

981 Where I agree and disagree with Eliezer

paulfchristiano

6mo

205

381 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra

5mo

89

249 The shard theory of human values

Quintin Pope

3mo

57

206 Realism about rationality

Richard_Ngo

4y

145

196 Utility Maximization = Description Length Minimization

johnswentworth

1y

40

191 Humans provide an untapped wealth of evidence about alignment

TurnTrout

5mo

92

175 2021 AI Alignment Literature Review and Charity Comparison

Larks

12mo

26

171 Finite Factored Sets in Pictures

Magdalena Wache

9d

29

163 Evolution of Modularity

johnswentworth

3y

12

160 Can you control the past?

Joe Carlsmith

1y

93

146 why assume AGIs will optimize for fixed goals?

nostalgebraist

6mo

52

141 Finite Factored Sets

Scott Garrabrant

1y

94

137 What's Up With Confusingly Pervasive Consequentialism?

Raemon

11mo

88

137 My research methodology

paulfchristiano

1y

36