Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

1913 posts AI World Modeling Inner Alignment Rationality Interpretability (ML & AI) AI Timelines Decision Theory GPT Research Agendas Abstraction Value Learning Impact Regularization

855 posts Logical Induction Threat Models Goodhart's Law Practice & Philosophy of Science Logical Uncertainty Intellectual Progress (Society-Level) Radical Probabilism Epistemology Ethics & Morality Software Tools Fiction Bayes' Theorem

13 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

70 Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceC

1d

9

27 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

37 Existential AI Safety is NOT separate from near-term applications

scasper

7d

15

47 Reframing inner alignment

davidad

9d

13

42 Positive values seem more robust and lasting than prohibitions

TurnTrout

3d

9

62 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

252 Reward is not the optimization target

TurnTrout

4mo

97

37 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

96 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

92 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

123 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

26 Paper: Transformers learn in-context by gradient descent

LawrenceC

4d

11

34 My AGI safety research—2022 review, ’23 plans

Steven Byrnes

6d

6

155 The next decades might be wild

Marius Hobbhahn

5d

21

58 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

144 Worlds Where Iterative Design Fails

johnswentworth

3mo

26

103 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

94 Thoughts on AGI organizations and capabilities work

Rob Bensinger

13d

17

336 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

36 AI X-risk >35% mostly based on a recent peer-reviewed argument

michaelcohen

1mo

31

48 Deconfusing Direct vs Amortised Optimization

beren

18d

6

13 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

50 We may be able to see sharp left turns coming

Ethan Perez

3mo

26

54 Methodological Therapy: An Agenda For Tackling Research Bottlenecks

adamShimi

2mo

6

93 Don't leave your fingerprints on the future

So8res

2mo

32

85 Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworth

4mo

49

135 Your posts should be on arXiv

JanBrauner

3mo

39