Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

1913 posts AI World Modeling Inner Alignment Rationality Interpretability (ML & AI) AI Timelines Decision Theory GPT Research Agendas Abstraction Value Learning Impact Regularization

855 posts Logical Induction Threat Models Goodhart's Law Practice & Philosophy of Science Logical Uncertainty Intellectual Progress (Society-Level) Radical Probabilism Epistemology Ethics & Morality Software Tools Fiction Bayes' Theorem

26 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

79 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

62 Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceC

1d

9

39 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

15 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

251 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

7 Note on algorithms with multiple trained components

Steven Byrnes

7h

1

132 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

12 Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.

Charlie Steiner

19h

0

171 Finite Factored Sets in Pictures

Magdalena Wache

9d

29

67 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

56 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

31 Take 11: "Aligning language models" should be weirder.

Charlie Steiner

2d

0

307 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

189 The next decades might be wild

Marius Hobbhahn

5d

21

40 AI Neorealism: a threat model & success criterion for existential safety

davidad

5d

0

132 Logical induction for software engineers

Alex Flint

17d

2

82 Thoughts on AGI organizations and capabilities work

Rob Bensinger

13d

17

429 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

56 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

42 Reflections on the PIBBSS Fellowship 2022

Nora_Ammann

9d

0

69 Deconfusing Direct vs Amortised Optimization

beren

18d

6

108 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

986 AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

6mo

653

265 Lessons learned from talking to >100 academics about AI safety

Marius Hobbhahn

2mo

16

517 It Looks Like You're Trying To Take Over The World

gwern

9mo

125

158 Most People Start With The Same Few Bad Ideas

johnswentworth

3mo

30

30 Refining the Sharp Left Turn threat model, part 2: applying alignment techniques

Vika

25d

4