Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

13671 posts Rationality World Modeling Practical World Optimization Covid-19 Community Fiction Site Meta Scholarship & Learning Politics Book Reviews Open Threads

18722 posts AI AI Risk GPT AI Timelines Decision Theory Interpretability (ML & AI) Machine Learning (ML) AI Takeoff Inner Alignment Anthropics Research Agendas Language Models

70 Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceC

1d

9

48 AGI Timelines in Governance: Different Strategies for Different Timeframes

simeon_c

1d

14

29 Notice when you stop reading right before you understand

just_browsing

18h

4

62 The True Spirit of Solstice?

Raemon

1d

23

128 How to Convince my Son that Drugs are Bad

concerned_dad

3d

77

48 Results from a survey on tool use and workflows in alignment research

jacquesthibs

1d

2

12 Marvel Snap: Phase 2

Zvi

9h

1

28 More notes from raising a late-talking kid

Steven Byrnes

21h

1

19 [Fiction] Unspoken Stone

Gordon Seidoh Worley

18h

0

23 our deepest wishes

carado

23h

0

10 Under-Appreciated Ways to Use Flashcards

Florence Hinder

11h

0

6 Properties of current AIs and some predictions of the evolution of AI from the perspective of scale-free theories of agency and regulative development

Roman Leventov

6h

0

26 Avoiding Psychopathic AI

Cameron Berg

1d

2

23 CEA Disambiguation

jefftk

1d

0

37 K-complexity is silly; use cross-entropy instead

So8res

1h

4

27 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

62 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

6 Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic

Akash

2h

0

37 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

10 Note on algorithms with multiple trained components

Steven Byrnes

6h

1

45 Next Level Seinfeld

Zvi

1d

6

91 Bad at Arithmetic, Promising at Math

cohenmacaulay

2d

17

13 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

21 Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.

Charlie Steiner

19h

0

153 The next decades might be wild

Marius Hobbhahn

5d

21

232 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

123 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

63 Can we efficiently explain model behaviors?

paulfchristiano

4d

0