Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

1125 posts AI Research Agendas AI Timelines Value Learning AI Takeoff Embedded Agency Eliciting Latent Knowledge (ELK) Community Reinforcement Learning Iterated Amplification Debate (AI safety technique) Game Theory

321 posts Conjecture (org) GPT Oracle AI Interpretability (ML & AI) Myopia Language Models OpenAI AI Boxing (Containment) Machine Learning (ML) DeepMind Acausal Trade Scaling Laws

259 Humans are very reliable agents

alyssavance

6mo

35

259 Two-year update on my personal AI timelines

Ajeya Cotra

4mo

60

252 What 2026 looks like

Daniel Kokotajlo

1y

98

242 Visible Thoughts Project and Bounty Announcement

So8res

1y

104

241 Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger

1y

257

233 Reward is not the optimization target

TurnTrout

4mo

97

231 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

231 DeepMind: Generally capable agents emerge from open-ended play

Daniel Kokotajlo

1y

53

219 ARC's first technical report: Eliciting Latent Knowledge

paulfchristiano

1y

88

217 larger language models may disappoint you [or, an eternally unfinished draft]

nostalgebraist

1y

29

214 Are we in an AI overhang?

Andy Jones

2y

109

213 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

213 Hiring engineers and researchers to help align GPT-3

paulfchristiano

2y

14

212 EfficientZero: How It Works

1a3orn

1y

42

318 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

258 The Parable of Predict-O-Matic

abramdemski

3y

42

254 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

234 chinchilla's wild implications

nostalgebraist

4mo

114

223 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

204 The case for aligning narrowly superhuman models

Ajeya Cotra

1y

74

199 Common misconceptions about OpenAI

Jacob_Hilton

3mo

138

191 New Scaling Laws for Large Language Models

1a3orn

8mo

21

185 Simulators

janus

3mo

103

183 The Plan - 2022 Update

johnswentworth

19d

33

178 Mysteries of mode collapse

janus

1mo

35

167 Chris Olah’s views on AGI safety

evhub

3y

38

164 MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"

Rob Bensinger

1y

13

163 Language models seem to be much better than humans at next-token prediction

Buck

4mo

56