Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
3083 posts
AI
GPT
AI Timelines
Machine Learning (ML)
AI Takeoff
Interpretability (ML & AI)
Language Models
Conjecture (org)
Careers
Instrumental Convergence
Iterated Amplification
Art
763 posts
Anthropics
Existential Risk
Whole Brain Emulation
Sleeping Beauty Paradox
Threat Models
Academic Papers
Space Exploration & Colonization
Great Filter
Paradoxes
Extraterrestrial Life
Pascal's Mugging
Longtermism
27
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
4h
3
62
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
6
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic
Akash
2h
0
37
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
45
Next Level Seinfeld
Zvi
1d
6
91
Bad at Arithmetic, Promising at Math
cohenmacaulay
2d
17
13
An Open Agency Architecture for Safe Transformative AI
davidad
11h
11
21
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
19h
0
153
The next decades might be wild
Marius Hobbhahn
5d
21
232
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
123
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
63
Can we efficiently explain model behaviors?
paulfchristiano
4d
0
55
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
29
Take 11: "Aligning language models" should be weirder.
Charlie Steiner
2d
0
39
AI Neorealism: a threat model & success criterion for existential safety
davidad
5d
0
68
AI Safety Seems Hard to Measure
HoldenKarnofsky
12d
5
336
Counterarguments to the basic AI x-risk case
KatjaGrace
2mo
122
61
Who are some prominent reasonable people who are confident that AI won't kill everyone?
Optimization Process
15d
40
103
AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak
28d
86
95
Meta AI announces Cicero: Human-Level Diplomacy play (with dialogue)
Jacy Reese Anthis
28d
64
777
Where I agree and disagree with Eliezer
paulfchristiano
6mo
205
14
all claw, no world — and other thoughts on the universal distribution
carado
6d
0
724
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
6mo
653
59
Could a single alien message destroy us?
Writer
25d
23
102
Clarifying AI X-risk
zac_kenton
1mo
23
98
Am I secretly excited for AI getting weird?
porby
1mo
4
29
Three Fables of Magical Girls and Longtermism
Ulisse Mini
18d
11
36
Refining the Sharp Left Turn threat model, part 2: applying alignment techniques
Vika
25d
4