Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

166 posts AI Risk Goodhart's Law World Optimization Threat Models Instrumental Convergence Corrigibility Existential Risk Coordination / Cooperation Academic Papers AI Safety Camp Ethics & Morality Treacherous Turn

689 posts Newsletters Logical Induction Epistemology SERI MATS Logical Uncertainty Intellectual Progress (Society-Level) Practice & Philosophy of Science AI Alignment Fieldbuilding Distillation & Pedagogy Bayes' Theorem Postmortems & Retrospectives Radical Probabilism

121 The next decades might be wild

Marius Hobbhahn

5d

21

60 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

147 Worlds Where Iterative Design Fails

johnswentworth

3mo

26

98 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

106 Thoughts on AGI organizations and capabilities work

Rob Bensinger

13d

17

243 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

28 AI X-risk >35% mostly based on a recent peer-reviewed argument

michaelcohen

1mo

31

27 Deconfusing Direct vs Amortised Optimization

beren

18d

6

17 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

33 We may be able to see sharp left turns coming

Ethan Perez

3mo

26

106 Don't leave your fingerprints on the future

So8res

2mo

32

79 Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworth

4mo

49

42 Refining the Sharp Left Turn threat model, part 2: applying alignment techniques

Vika

25d

4

92 An Update on Academia vs. Industry (one year into my faculty job)

David Scott Krueger (formerly: capybaralet)

3mo

18

53 Methodological Therapy: An Agenda For Tackling Research Bottlenecks

adamShimi

2mo

6

126 Your posts should be on arXiv

JanBrauner

3mo

39

116 Logical induction for software engineers

Alex Flint

17d

2

82 Principles of Privacy for Alignment Research

johnswentworth

4mo

30

30 Behaviour Manifolds and the Hessian of the Total Loss - Notes and Criticism

Spencer Becker-Kahn

3mo

4

9 What are concrete examples of potential "lock-in" in AI research?

Grue_Slinky

3y

6

164 Most People Start With The Same Few Bad Ideas

johnswentworth

3mo

30

22 [AN #112]: Engineering a Safer World

Rohin Shah

2y

2

71 Conjecture: Internal Infohazard Policy

Connor Leahy

4mo

6

17 Attempts at Forwarding Speed Priors

james.lucassen

2mo

2

29 Rob B's Shortform Feed

Rob Bensinger

3y

79

95 How to do theoretical research, a personal perspective

Mark Xu

4mo

4

87 Intuitions about solving hard problems

Richard_Ngo

7mo

23

15 Abram Demski's ELK thoughts and proposal - distillation

Rubi J. Hudson

5mo

4