Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

1913 posts AI World Modeling Inner Alignment Rationality Interpretability (ML & AI) AI Timelines Decision Theory GPT Research Agendas Abstraction Value Learning Impact Regularization

855 posts Logical Induction Threat Models Goodhart's Law Practice & Philosophy of Science Logical Uncertainty Intellectual Progress (Society-Level) Radical Probabilism Epistemology Ethics & Morality Software Tools Fiction Bayes' Theorem

11 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

78 Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceC

1d

9

28 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

25 Existential AI Safety is NOT separate from near-term applications

scasper

7d

15

59 Reframing inner alignment

davidad

9d

13

46 Positive values seem more robust and lasting than prohibitions

TurnTrout

3d

9

45 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

233 Reward is not the optimization target

TurnTrout

4mo

97

35 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

90 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

99 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

114 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

32 Paper: Transformers learn in-context by gradient descent

LawrenceC

4d

11

35 My AGI safety research—2022 review, ’23 plans

Steven Byrnes

6d

6

121 The next decades might be wild

Marius Hobbhahn

5d

21

60 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

147 Worlds Where Iterative Design Fails

johnswentworth

3mo

26

98 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

106 Thoughts on AGI organizations and capabilities work

Rob Bensinger

13d

17

243 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

28 AI X-risk >35% mostly based on a recent peer-reviewed argument

michaelcohen

1mo

31

27 Deconfusing Direct vs Amortised Optimization

beren

18d

6

17 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

33 We may be able to see sharp left turns coming

Ethan Perez

3mo

26

53 Methodological Therapy: An Agenda For Tackling Research Bottlenecks

adamShimi

2mo

6

106 Don't leave your fingerprints on the future

So8res

2mo

32

79 Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworth

4mo

49

126 Your posts should be on arXiv

JanBrauner

3mo

39