Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

166 posts AI Risk Goodhart's Law World Optimization Threat Models Instrumental Convergence Corrigibility Existential Risk Coordination / Cooperation Academic Papers AI Safety Camp Ethics & Morality Treacherous Turn

689 posts Newsletters Logical Induction Epistemology SERI MATS Logical Uncertainty Intellectual Progress (Society-Level) Practice & Philosophy of Science AI Alignment Fieldbuilding Distillation & Pedagogy Bayes' Theorem Postmortems & Retrospectives Radical Probabilism

189 The next decades might be wild

Marius Hobbhahn

5d

21

56 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

141 Worlds Where Iterative Design Fails

johnswentworth

3mo

26

108 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

82 Thoughts on AGI organizations and capabilities work

Rob Bensinger

13d

17

429 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

44 AI X-risk >35% mostly based on a recent peer-reviewed argument

michaelcohen

1mo

31

69 Deconfusing Direct vs Amortised Optimization

beren

18d

6

9 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

67 We may be able to see sharp left turns coming

Ethan Perez

3mo

26

80 Don't leave your fingerprints on the future

So8res

2mo

32

91 Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworth

4mo

49

30 Refining the Sharp Left Turn threat model, part 2: applying alignment techniques

Vika

25d

4

144 An Update on Academia vs. Industry (one year into my faculty job)

David Scott Krueger (formerly: capybaralet)

3mo

18

55 Methodological Therapy: An Agenda For Tackling Research Bottlenecks

adamShimi

2mo

6

144 Your posts should be on arXiv

JanBrauner

3mo

39

132 Logical induction for software engineers

Alex Flint

17d

2

54 Principles of Privacy for Alignment Research

johnswentworth

4mo

30

40 Behaviour Manifolds and the Hessian of the Total Loss - Notes and Criticism

Spencer Becker-Kahn

3mo

4

25 What are concrete examples of potential "lock-in" in AI research?

Grue_Slinky

3y

6

158 Most People Start With The Same Few Bad Ideas

johnswentworth

3mo

30

28 [AN #112]: Engineering a Safer World

Rohin Shah

2y

2

167 Conjecture: Internal Infohazard Policy

Connor Leahy

4mo

6

29 Attempts at Forwarding Speed Priors

james.lucassen

2mo

2

13 Rob B's Shortform Feed

Rob Bensinger

3y

79

73 How to do theoretical research, a personal perspective

Mark Xu

4mo

4

97 Intuitions about solving hard problems

Richard_Ngo

7mo

23

15 Abram Demski's ELK thoughts and proposal - distillation

Rubi J. Hudson

5mo

4