Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

4148 posts AI AI Risk GPT AI Timelines Machine Learning (ML) Anthropics AI Takeoff Interpretability (ML & AI) Existential Risk Inner Alignment Neuroscience Goodhart's Law

14574 posts Decision Theory Utility Functions Embedded Agency Value Learning Suffering Counterfactuals Nutrition Animal Welfare Newcomb's Problem Research Agendas VNM Theorem Risks of Astronomical Suffering (S-risks)

27 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

62 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

6 Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic

Akash

2h

0

37 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

45 Next Level Seinfeld

Zvi

1d

6

91 Bad at Arithmetic, Promising at Math

cohenmacaulay

2d

17

13 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

21 Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.

Charlie Steiner

19h

0

153 The next decades might be wild

Marius Hobbhahn

5d

21

232 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

123 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

63 Can we efficiently explain model behaviors?

paulfchristiano

4d

0

60 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

55 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

37 K-complexity is silly; use cross-entropy instead

So8res

1h

4

10 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

34 My AGI safety research—2022 review, ’23 plans

Steven Byrnes

6d

6

47 Take 7: You should talk about "the human's utility function" less.

Charlie Steiner

12d

22

23 How can one literally buy time (from x-risk) with money?

Alex_Altair

7d

3

81 When AI solves a game, focus on the game's mechanics, not its theme.

Cleo Nardo

27d

7

19 Using Obsidian if you're used to using Roam

Solenoid_Entity

9d

4

27 "Attention Passengers": not for Signs

jefftk

13d

10

142 Decision theory does not imply that we get to have nice things

So8res

2mo

53

74 Will we run out of ML data? Evidence from projecting dataset size trends

Pablo Villalobos

1mo

12

10 Join the AI Testing Hackathon this Friday

Esben Kran

8d

0

16 Riffing on the agent type

Quinn

12d

0

252 Reward is not the optimization target

TurnTrout

4mo

97

30 What videos should Rational Animations make?

Writer

24d

23