Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

4148 posts AI AI Risk GPT AI Timelines Machine Learning (ML) Anthropics AI Takeoff Interpretability (ML & AI) Existential Risk Inner Alignment Neuroscience Goodhart's Law

14574 posts Decision Theory Utility Functions Embedded Agency Value Learning Suffering Counterfactuals Nutrition Animal Welfare Newcomb's Problem Research Agendas VNM Theorem Risks of Astronomical Suffering (S-risks)

27 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

84 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

16 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

198 The next decades might be wild

Marius Hobbhahn

5d

21

6 I believe some AI doomers are overconfident

FTPickle

6h

4

41 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

37 Reframing inner alignment

davidad

9d

13

7 Will research in AI risk jinx it? Consequences of training AI on AI risk arguments

Yann Dubois

1d

6

112 Bad at Arithmetic, Promising at Math

cohenmacaulay

2d

17

52 Existential AI Safety is NOT separate from near-term applications

scasper

7d

15

47 Next Level Seinfeld

Zvi

1d

6

26 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

11 Will Machines Ever Rule the World? MLAISU W50

Esben Kran

4d

4

108 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

28 K-complexity is silly; use cross-entropy instead

So8res

1h

4

170 Can you control the past?

Joe Carlsmith

1y

93

286 Reward is not the optimization target

TurnTrout

4mo

97

7 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

-6 Ponzi schemes can be highly profitable if your timing is good

GeneSmith

8d

18

34 My AGI safety research—2022 review, ’23 plans

Steven Byrnes

6d

6

36 Take 7: You should talk about "the human's utility function" less.

Charlie Steiner

12d

22

96 wrapper-minds are the enemy

nostalgebraist

6mo

36

46 What videos should Rational Animations make?

Writer

24d

23

146 Decision theory does not imply that we get to have nice things

So8res

2mo

53

23 "Attention Passengers": not for Signs

jefftk

13d

10

48 Notes on "Can you control the past"

So8res

2mo

40

36 Humans do acausal coordination all the time

Adam Jermyn

1mo

36

19 Decision Theory but also Ghosts

eva_

1mo

21