Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

3083 posts AI GPT AI Timelines Machine Learning (ML) AI Takeoff Interpretability (ML & AI) Language Models Conjecture (org) Careers Instrumental Convergence Iterated Amplification Art

763 posts Anthropics Existential Risk Whole Brain Emulation Sleeping Beauty Paradox Threat Models Academic Papers Space Exploration & Colonization Great Filter Paradoxes Extraterrestrial Life Pascal's Mugging Longtermism

27 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

62 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

6 Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic

Akash

2h

0

37 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

45 Next Level Seinfeld

Zvi

1d

6

91 Bad at Arithmetic, Promising at Math

cohenmacaulay

2d

17

13 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

21 Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.

Charlie Steiner

19h

0

153 The next decades might be wild

Marius Hobbhahn

5d

21

232 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

123 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

63 Can we efficiently explain model behaviors?

paulfchristiano

4d

0

55 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

29 Take 11: "Aligning language models" should be weirder.

Charlie Steiner

2d

0

39 AI Neorealism: a threat model & success criterion for existential safety

davidad

5d

0

68 AI Safety Seems Hard to Measure

HoldenKarnofsky

12d

5

336 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

61 Who are some prominent reasonable people who are confident that AI won't kill everyone?

Optimization Process

15d

40

103 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

95 Meta AI announces Cicero: Human-Level Diplomacy play (with dialogue)

Jacy Reese Anthis

28d

64

777 Where I agree and disagree with Eliezer

paulfchristiano

6mo

205

14 all claw, no world — and other thoughts on the universal distribution

carado

6d

0

724 AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

6mo

653

59 Could a single alien message destroy us?

Writer

25d

23

102 Clarifying AI X-risk

zac_kenton

1mo

23

98 Am I secretly excited for AI getting weird?

porby

1mo

4

29 Three Fables of Magical Girls and Longtermism

Ulisse Mini

18d

11

36 Refining the Sharp Left Turn threat model, part 2: applying alignment techniques

Vika

25d

4