Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

2237 posts AI AI Timelines AI Takeoff Careers Audio Infra-Bayesianism DeepMind Interviews SERI MATS Dialogue (format) Agent Foundations Redwood Research

358 posts Iterated Amplification Myopia Factored Cognition Humans Consulting HCH Corrigibility Interpretability (ML & AI) Debate (AI safety technique) Experiments Self Fulfilling/Refuting Prophecies Ought Orthogonality Thesis Instrumental Convergence

531 (My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen

3mo

83

436 How To Get Into Independent Research On Alignment/Agency

johnswentworth

1y

33

432 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

404 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra

5mo

89

394 Why I think strong general AI is coming soon

porby

2mo

126

373 We Choose To Align AI

johnswentworth

11mo

15

332 Two-year update on my personal AI timelines

Ajeya Cotra

4mo

60

331 What should you change in response to an "emergency"? And AI risk

AnnaSalamon

5mo

60

323 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

314 Why Agent Foundations? An Overly Abstract Explanation

johnswentworth

9mo

54

310 Are we in an AI overhang?

Andy Jones

2y

109

291 Fun with +12 OOMs of Compute

Daniel Kokotajlo

1y

78

287 Don't die with dignity; instead play to your outs

Jeffrey Ladish

8mo

58

282 AGI Safety FAQ / all-dumb-questions-allowed thread

Aryeh Englander

6mo

514

446 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

250 The Plan - 2022 Update

johnswentworth

19d

33

239 Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More

Ben Pace

3y

60

235 The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren

22d

27

208 Sorting Pebbles Into Correct Heaps

Eliezer Yudkowsky

14y

109

201 Interpreting Neural Networks through the Polytope Lens

Sid Black

2mo

26

194 Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout

3y

38

184 Godzilla Strategies

johnswentworth

6mo

65

164 Understanding “Deep Double Descent”

evhub

3y

51

151 Re-Examining LayerNorm

Eric Winsor

19d

8

140 Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers

lifelonglearner

1y

16

140 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

139 Paul's research agenda FAQ

zhukeepa

4y

73

134 Circumventing interpretability: How to defeat mind-readers

Lee Sharkey

5mo

8