Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

3083 posts AI GPT AI Timelines Machine Learning (ML) AI Takeoff Interpretability (ML & AI) Language Models Conjecture (org) Careers Instrumental Convergence Iterated Amplification Art

763 posts Anthropics Existential Risk Whole Brain Emulation Sleeping Beauty Paradox Threat Models Academic Papers Space Exploration & Colonization Great Filter Paradoxes Extraterrestrial Life Pascal's Mugging Longtermism

27 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

84 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

41 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

5 Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic

Akash

2h

0

112 Bad at Arithmetic, Promising at Math

cohenmacaulay

2d

17

16 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

47 Next Level Seinfeld

Zvi

1d

6

198 The next decades might be wild

Marius Hobbhahn

5d

21

265 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

140 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

6 I believe some AI doomers are overconfident

FTPickle

6h

4

5 Career Scouting: Housing Coordination

koratkar

5h

0

13 Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.

Charlie Steiner

19h

0

6 (Extremely) Naive Gradient Hacking Doesn't Work

ojorgensen

9h

0

42 AI Neorealism: a threat model & success criterion for existential safety

davidad

5d

0

77 AI Safety Seems Hard to Measure

HoldenKarnofsky

12d

5

455 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

64 Who are some prominent reasonable people who are confident that AI won't kill everyone?

Optimization Process

15d

40

1039 Where I agree and disagree with Eliezer

paulfchristiano

6mo

205

113 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

1043 AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

6mo

653

103 Meta AI announces Cicero: Human-Level Diplomacy play (with dialogue)

Jacy Reese Anthis

28d

64

148 Clarifying AI X-risk

zac_kenton

1mo

23

13 all claw, no world — and other thoughts on the universal distribution

carado

6d

0

55 Could a single alien message destroy us?

Writer

25d

23

35 Three Fables of Magical Girls and Longtermism

Ulisse Mini

18d

11

117 Am I secretly excited for AI getting weird?

porby

1mo

4

100 All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Robert Miles

1mo

100