Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

1446 posts AI Interpretability (ML & AI) AI Timelines GPT Research Agendas Value Learning AI Takeoff Conjecture (org) Embedded Agency Machine Learning (ML) Eliciting Latent Knowledge (ELK) Community

118 posts Inner Alignment Optimization Solomonoff Induction Predictive Processing Selection vs Control Neocortex Mesa-Optimization Neuroscience Priors AI Services (CAIS) Occam's Razor General Intelligence

318 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

259 Humans are very reliable agents

alyssavance

6mo

35

259 Two-year update on my personal AI timelines

Ajeya Cotra

4mo

60

258 The Parable of Predict-O-Matic

abramdemski

3y

42

254 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

252 What 2026 looks like

Daniel Kokotajlo

1y

98

242 Visible Thoughts Project and Bounty Announcement

So8res

1y

104

241 Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger

1y

257

234 chinchilla's wild implications

nostalgebraist

4mo

114

233 Reward is not the optimization target

TurnTrout

4mo

97

231 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

231 DeepMind: Generally capable agents emerge from open-ended play

Daniel Kokotajlo

1y

53

223 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

219 ARC's first technical report: Eliciting Latent Knowledge

paulfchristiano

1y

88

206 The ground of optimization

Alex Flint

2y

74

175 Inner Alignment: Explain like I'm 12 Edition

Rafael Harth

2y

46

162 The Solomonoff Prior is Malign

Mark Xu

2y

52

151 Matt Botvinick on the spontaneous emergence of learning algorithms

Adam Scholl

2y

87

148 Risks from Learned Optimization: Introduction

evhub

3y

42

141 Inner Alignment in Salt-Starved Rats

Steven Byrnes

2y

39

137 Selection vs Control

abramdemski

3y

25

123 Book review: "A Thousand Brains" by Jeff Hawkins

Steven Byrnes

1y

18

122 A Semitechnical Introductory Dialogue on Solomonoff Induction

Eliezer Yudkowsky

1y

34

114 My computational framework for the brain

Steven Byrnes

2y

26

113 Demons in Imperfect Search

johnswentworth

2y

21

103 Gradient hacking

evhub

3y

39

102 Optimization Amplifies

Scott Garrabrant

4y

12

98 The Inner Alignment Problem

evhub

3y

17