Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

166 posts AI Risk Goodhart's Law World Optimization Threat Models Instrumental Convergence Corrigibility Existential Risk Coordination / Cooperation Academic Papers AI Safety Camp Ethics & Morality Treacherous Turn

689 posts Newsletters Logical Induction Epistemology SERI MATS Logical Uncertainty Intellectual Progress (Society-Level) Practice & Philosophy of Science AI Alignment Fieldbuilding Distillation & Pedagogy Bayes' Theorem Postmortems & Retrospectives Radical Probabilism

155 The next decades might be wild

Marius Hobbhahn

5d

21

58 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

144 Worlds Where Iterative Design Fails

johnswentworth

3mo

26

103 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

94 Thoughts on AGI organizations and capabilities work

Rob Bensinger

13d

17

336 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

36 AI X-risk >35% mostly based on a recent peer-reviewed argument

michaelcohen

1mo

31

48 Deconfusing Direct vs Amortised Optimization

beren

18d

6

13 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

50 We may be able to see sharp left turns coming

Ethan Perez

3mo

26

93 Don't leave your fingerprints on the future

So8res

2mo

32

85 Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworth

4mo

49

36 Refining the Sharp Left Turn threat model, part 2: applying alignment techniques

Vika

25d

4

118 An Update on Academia vs. Industry (one year into my faculty job)

David Scott Krueger (formerly: capybaralet)

3mo

18

54 Methodological Therapy: An Agenda For Tackling Research Bottlenecks

adamShimi

2mo

6

135 Your posts should be on arXiv

JanBrauner

3mo

39

124 Logical induction for software engineers

Alex Flint

17d

2

68 Principles of Privacy for Alignment Research

johnswentworth

4mo

30

35 Behaviour Manifolds and the Hessian of the Total Loss - Notes and Criticism

Spencer Becker-Kahn

3mo

4

17 What are concrete examples of potential "lock-in" in AI research?

Grue_Slinky

3y

6

161 Most People Start With The Same Few Bad Ideas

johnswentworth

3mo

30

25 [AN #112]: Engineering a Safer World

Rohin Shah

2y

2

119 Conjecture: Internal Infohazard Policy

Connor Leahy

4mo

6

23 Attempts at Forwarding Speed Priors

james.lucassen

2mo

2

21 Rob B's Shortform Feed

Rob Bensinger

3y

79

84 How to do theoretical research, a personal perspective

Mark Xu

4mo

4

92 Intuitions about solving hard problems

Richard_Ngo

7mo

23

15 Abram Demski's ELK thoughts and proposal - distillation

Rubi J. Hudson

5mo

4