Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

13671 posts Rationality World Modeling Practical World Optimization Covid-19 Community Fiction Site Meta Scholarship & Learning Politics Book Reviews Open Threads

18722 posts AI AI Risk GPT AI Timelines Decision Theory Interpretability (ML & AI) Machine Learning (ML) AI Takeoff Inner Alignment Anthropics Research Agendas Language Models

75 Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceC

1d

9

20 Marvel Snap: Phase 2

Zvi

9h

1

73 The True Spirit of Solstice?

Raemon

1d

23

34 More notes from raising a late-talking kid

Steven Byrnes

21h

1

35 AGI Timelines in Governance: Different Strategies for Different Timeframes

simeon_c

1d

14

23 Notice when you stop reading right before you understand

just_browsing

18h

4

106 How to Convince my Son that Drugs are Bad

concerned_dad

3d

77

40 Results from a survey on tool use and workflows in alignment research

jacquesthibs

1d

2

21 [Fiction] Unspoken Stone

Gordon Seidoh Worley

18h

0

54 Why Are Women Hot?

Jacob Falkovich

2d

10

29 CEA Disambiguation

jefftk

1d

0

8 Under-Appreciated Ways to Use Flashcards

Florence Hinder

11h

0

22 Avoiding Psychopathic AI

Cameron Berg

1d

2

16 our deepest wishes

carado

23h

0

46 K-complexity is silly; use cross-entropy instead

So8res

1h

4

27 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

7 Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic

Akash

2h

0

13 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

29 Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.

Charlie Steiner

19h

0

33 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

40 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

43 Next Level Seinfeld

Zvi

1d

6

70 Bad at Arithmetic, Promising at Math

cohenmacaulay

2d

17

10 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

199 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

106 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

108 The next decades might be wild

Marius Hobbhahn

5d

21

70 Can we efficiently explain model behaviors?

paulfchristiano

4d

0