Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
3083 posts
AI
GPT
AI Timelines
Machine Learning (ML)
AI Takeoff
Interpretability (ML & AI)
Language Models
Conjecture (org)
Careers
Instrumental Convergence
Iterated Amplification
Art
763 posts
Anthropics
Existential Risk
Whole Brain Emulation
Sleeping Beauty Paradox
Threat Models
Academic Papers
Space Exploration & Colonization
Great Filter
Paradoxes
Extraterrestrial Life
Pascal's Mugging
Longtermism
27
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
4h
3
84
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
41
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
5
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic
Akash
2h
0
112
Bad at Arithmetic, Promising at Math
cohenmacaulay
2d
17
16
An Open Agency Architecture for Safe Transformative AI
davidad
11h
11
47
Next Level Seinfeld
Zvi
1d
6
198
The next decades might be wild
Marius Hobbhahn
5d
21
265
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
140
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
6
I believe some AI doomers are overconfident
FTPickle
6h
4
5
Career Scouting: Housing Coordination
koratkar
5h
0
13
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
19h
0
6
(Extremely) Naive Gradient Hacking Doesn't Work
ojorgensen
9h
0
42
AI Neorealism: a threat model & success criterion for existential safety
davidad
5d
0
77
AI Safety Seems Hard to Measure
HoldenKarnofsky
12d
5
455
Counterarguments to the basic AI x-risk case
KatjaGrace
2mo
122
64
Who are some prominent reasonable people who are confident that AI won't kill everyone?
Optimization Process
15d
40
1039
Where I agree and disagree with Eliezer
paulfchristiano
6mo
205
113
AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak
28d
86
1043
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
6mo
653
103
Meta AI announces Cicero: Human-Level Diplomacy play (with dialogue)
Jacy Reese Anthis
28d
64
148
Clarifying AI X-risk
zac_kenton
1mo
23
13
all claw, no world — and other thoughts on the universal distribution
carado
6d
0
55
Could a single alien message destroy us?
Writer
25d
23
35
Three Fables of Magical Girls and Longtermism
Ulisse Mini
18d
11
117
Am I secretly excited for AI getting weird?
porby
1mo
4
100
All AGI Safety questions welcome (especially basic ones) [~monthly thread]
Robert Miles
1mo
100