Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
13671 posts
Rationality
World Modeling
Practical
World Optimization
Covid-19
Community
Fiction
Site Meta
Scholarship & Learning
Politics
Book Reviews
Open Threads
18722 posts
AI
AI Risk
GPT
AI Timelines
Decision Theory
Interpretability (ML & AI)
Machine Learning (ML)
AI Takeoff
Inner Alignment
Anthropics
Research Agendas
Language Models
75
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
1d
9
20
Marvel Snap: Phase 2
Zvi
9h
1
73
The True Spirit of Solstice?
Raemon
1d
23
34
More notes from raising a late-talking kid
Steven Byrnes
21h
1
35
AGI Timelines in Governance: Different Strategies for Different Timeframes
simeon_c
1d
14
23
Notice when you stop reading right before you understand
just_browsing
18h
4
106
How to Convince my Son that Drugs are Bad
concerned_dad
3d
77
40
Results from a survey on tool use and workflows in alignment research
jacquesthibs
1d
2
21
[Fiction] Unspoken Stone
Gordon Seidoh Worley
18h
0
54
Why Are Women Hot?
Jacob Falkovich
2d
10
29
CEA Disambiguation
jefftk
1d
0
8
Under-Appreciated Ways to Use Flashcards
Florence Hinder
11h
0
22
Avoiding Psychopathic AI
Cameron Berg
1d
2
16
our deepest wishes
carado
23h
0
46
K-complexity is silly; use cross-entropy instead
So8res
1h
4
27
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
4h
3
7
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic
Akash
2h
0
13
Note on algorithms with multiple trained components
Steven Byrnes
6h
1
29
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
19h
0
33
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
40
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
43
Next Level Seinfeld
Zvi
1d
6
70
Bad at Arithmetic, Promising at Math
cohenmacaulay
2d
17
10
An Open Agency Architecture for Safe Transformative AI
davidad
11h
11
199
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
106
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
108
The next decades might be wild
Marius Hobbhahn
5d
21
70
Can we efficiently explain model behaviors?
paulfchristiano
4d
0