Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

1564 posts AI Inner Alignment Interpretability (ML & AI) AI Timelines GPT Research Agendas AI Takeoff Value Learning Machine Learning (ML) Conjecture (org) Mesa-Optimization Outer Alignment

349 posts Abstraction Impact Regularization Rationality World Modeling Decision Theory Human Values Goal-Directedness Anthropics Utility Functions Finite Factored Sets Shard Theory Fixed Point Theorems

318 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

259 Humans are very reliable agents

alyssavance

6mo

35

259 Two-year update on my personal AI timelines

Ajeya Cotra

4mo

60

258 The Parable of Predict-O-Matic

abramdemski

3y

42

254 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

252 What 2026 looks like

Daniel Kokotajlo

1y

98

242 Visible Thoughts Project and Bounty Announcement

So8res

1y

104

241 Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger

1y

257

234 chinchilla's wild implications

nostalgebraist

4mo

114

233 Reward is not the optimization target

TurnTrout

4mo

97

231 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

231 DeepMind: Generally capable agents emerge from open-ended play

Daniel Kokotajlo

1y

53

223 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

219 ARC's first technical report: Eliciting Latent Knowledge

paulfchristiano

1y

88

573 Where I agree and disagree with Eliezer

paulfchristiano

6mo

205

239 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra

5mo

89

201 What's Up With Confusingly Pervasive Consequentialism?

Raemon

11mo

88

170 Utility Maximization = Description Length Minimization

johnswentworth

1y

40

159 Humans provide an untapped wealth of evidence about alignment

TurnTrout

5mo

92

159 My research methodology

paulfchristiano

1y

36

159 Testing The Natural Abstraction Hypothesis: Project Intro

johnswentworth

1y

34

155 Evolution of Modularity

johnswentworth

3y

12

155 The shard theory of human values

Quintin Pope

3mo

57

154 Realism about rationality

Richard_Ngo

4y

145

153 2021 AI Alignment Literature Review and Charity Comparison

Larks

12mo

26

146 Saving Time

Scott Garrabrant

1y

19

145 Fixing The Good Regulator Theorem

johnswentworth

1y

25

140 Shard Theory: An Overview

David Udell

4mo

34