Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
4148 posts
AI
AI Risk
GPT
AI Timelines
Machine Learning (ML)
Anthropics
AI Takeoff
Interpretability (ML & AI)
Existential Risk
Inner Alignment
Neuroscience
Goodhart's Law
14574 posts
Decision Theory
Utility Functions
Embedded Agency
Value Learning
Suffering
Counterfactuals
Nutrition
Animal Welfare
Newcomb's Problem
Research Agendas
VNM Theorem
Risks of Astronomical Suffering (S-risks)
27
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
4h
3
84
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
41
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
5
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic
Akash
2h
0
112
Bad at Arithmetic, Promising at Math
cohenmacaulay
2d
17
16
An Open Agency Architecture for Safe Transformative AI
davidad
11h
11
47
Next Level Seinfeld
Zvi
1d
6
198
The next decades might be wild
Marius Hobbhahn
5d
21
265
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
140
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
6
I believe some AI doomers are overconfident
FTPickle
6h
4
5
Career Scouting: Housing Coordination
koratkar
5h
0
13
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
19h
0
6
(Extremely) Naive Gradient Hacking Doesn't Work
ojorgensen
9h
0
28
K-complexity is silly; use cross-entropy instead
So8res
1h
4
7
Note on algorithms with multiple trained components
Steven Byrnes
6h
1
34
My AGI safety research—2022 review, ’23 plans
Steven Byrnes
6d
6
21
How can one literally buy time (from x-risk) with money?
Alex_Altair
7d
3
91
When AI solves a game, focus on the game's mechanics, not its theme.
Cleo Nardo
27d
7
36
Take 7: You should talk about "the human's utility function" less.
Charlie Steiner
12d
22
109
Will we run out of ML data? Evidence from projecting dataset size trends
Pablo Villalobos
1mo
12
22
Using Obsidian if you're used to using Roam
Solenoid_Entity
9d
4
124
New book on s-risks
Tobias_Baumann
1mo
1
146
Decision theory does not imply that we get to have nice things
So8res
2mo
53
14
Join the AI Testing Hackathon this Friday
Esben Kran
8d
0
46
What videos should Rational Animations make?
Writer
24d
23
23
"Attention Passengers": not for Signs
jefftk
13d
10
286
Reward is not the optimization target
TurnTrout
4mo
97