Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
1913 posts
AI
World Modeling
Inner Alignment
Rationality
Interpretability (ML & AI)
AI Timelines
Decision Theory
GPT
Research Agendas
Abstraction
Value Learning
Impact Regularization
855 posts
Logical Induction
Threat Models
Goodhart's Law
Practice & Philosophy of Science
Logical Uncertainty
Intellectual Progress (Society-Level)
Radical Probabilism
Epistemology
Ethics & Morality
Software Tools
Fiction
Bayes' Theorem
26
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
4h
3
79
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
62
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
1d
9
39
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
15
An Open Agency Architecture for Safe Transformative AI
davidad
11h
11
251
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
7
Note on algorithms with multiple trained components
Steven Byrnes
7h
1
132
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
12
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
19h
0
171
Finite Factored Sets in Pictures
Magdalena Wache
9d
29
67
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
56
Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)
LawrenceC
4d
10
31
Take 11: "Aligning language models" should be weirder.
Charlie Steiner
2d
0
307
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
189
The next decades might be wild
Marius Hobbhahn
5d
21
40
AI Neorealism: a threat model & success criterion for existential safety
davidad
5d
0
132
Logical induction for software engineers
Alex Flint
17d
2
82
Thoughts on AGI organizations and capabilities work
Rob Bensinger
13d
17
429
Counterarguments to the basic AI x-risk case
KatjaGrace
2mo
122
56
You can still fetch the coffee today if you're dead tomorrow
davidad
11d
15
42
Reflections on the PIBBSS Fellowship 2022
Nora_Ammann
9d
0
69
Deconfusing Direct vs Amortised Optimization
beren
18d
6
108
AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak
28d
86
986
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
6mo
653
265
Lessons learned from talking to >100 academics about AI safety
Marius Hobbhahn
2mo
16
517
It Looks Like You're Trying To Take Over The World
gwern
9mo
125
158
Most People Start With The Same Few Bad Ideas
johnswentworth
3mo
30
30
Refining the Sharp Left Turn threat model, part 2: applying alignment techniques
Vika
25d
4