Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
4148 posts
AI
AI Risk
GPT
AI Timelines
Machine Learning (ML)
Anthropics
AI Takeoff
Interpretability (ML & AI)
Existential Risk
Inner Alignment
Neuroscience
Goodhart's Law
14574 posts
Decision Theory
Utility Functions
Embedded Agency
Value Learning
Suffering
Counterfactuals
Nutrition
Animal Welfare
Newcomb's Problem
Research Agendas
VNM Theorem
Risks of Astronomical Suffering (S-risks)
27
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
4h
3
62
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
6
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic
Akash
2h
0
37
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
45
Next Level Seinfeld
Zvi
1d
6
91
Bad at Arithmetic, Promising at Math
cohenmacaulay
2d
17
13
An Open Agency Architecture for Safe Transformative AI
davidad
11h
11
21
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
19h
0
153
The next decades might be wild
Marius Hobbhahn
5d
21
232
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
123
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
63
Can we efficiently explain model behaviors?
paulfchristiano
4d
0
60
Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)
LawrenceC
4d
10
55
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
37
K-complexity is silly; use cross-entropy instead
So8res
1h
4
10
Note on algorithms with multiple trained components
Steven Byrnes
6h
1
34
My AGI safety research—2022 review, ’23 plans
Steven Byrnes
6d
6
47
Take 7: You should talk about "the human's utility function" less.
Charlie Steiner
12d
22
23
How can one literally buy time (from x-risk) with money?
Alex_Altair
7d
3
81
When AI solves a game, focus on the game's mechanics, not its theme.
Cleo Nardo
27d
7
19
Using Obsidian if you're used to using Roam
Solenoid_Entity
9d
4
27
"Attention Passengers": not for Signs
jefftk
13d
10
142
Decision theory does not imply that we get to have nice things
So8res
2mo
53
74
Will we run out of ML data? Evidence from projecting dataset size trends
Pablo Villalobos
1mo
12
10
Join the AI Testing Hackathon this Friday
Esben Kran
8d
0
16
Riffing on the agent type
Quinn
12d
0
252
Reward is not the optimization target
TurnTrout
4mo
97
30
What videos should Rational Animations make?
Writer
24d
23