Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
13671 posts
Rationality
World Modeling
Practical
World Optimization
Covid-19
Community
Fiction
Site Meta
Scholarship & Learning
Politics
Book Reviews
Open Threads
18722 posts
AI
AI Risk
GPT
AI Timelines
Decision Theory
Interpretability (ML & AI)
Machine Learning (ML)
AI Takeoff
Inner Alignment
Anthropics
Research Agendas
Language Models
70
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC
1d
9
48
AGI Timelines in Governance: Different Strategies for Different Timeframes
simeon_c
1d
14
29
Notice when you stop reading right before you understand
just_browsing
18h
4
62
The True Spirit of Solstice?
Raemon
1d
23
128
How to Convince my Son that Drugs are Bad
concerned_dad
3d
77
48
Results from a survey on tool use and workflows in alignment research
jacquesthibs
1d
2
12
Marvel Snap: Phase 2
Zvi
9h
1
28
More notes from raising a late-talking kid
Steven Byrnes
21h
1
19
[Fiction] Unspoken Stone
Gordon Seidoh Worley
18h
0
23
our deepest wishes
carado
23h
0
10
Under-Appreciated Ways to Use Flashcards
Florence Hinder
11h
0
6
Properties of current AIs and some predictions of the evolution of AI from the perspective of scale-free theories of agency and regulative development
Roman Leventov
6h
0
26
Avoiding Psychopathic AI
Cameron Berg
1d
2
23
CEA Disambiguation
jefftk
1d
0
37
K-complexity is silly; use cross-entropy instead
So8res
1h
4
27
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
4h
3
62
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
6
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic
Akash
2h
0
37
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
10
Note on algorithms with multiple trained components
Steven Byrnes
6h
1
45
Next Level Seinfeld
Zvi
1d
6
91
Bad at Arithmetic, Promising at Math
cohenmacaulay
2d
17
13
An Open Agency Architecture for Safe Transformative AI
davidad
11h
11
21
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
19h
0
153
The next decades might be wild
Marius Hobbhahn
5d
21
232
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
123
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
63
Can we efficiently explain model behaviors?
paulfchristiano
4d
0