Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
934 posts
AI
Value Learning
Embedded Agency
Community
Eliciting Latent Knowledge (ELK)
Reinforcement Learning
Infra-Bayesianism
Counterfactuals
Logic & Mathematics
Interviews
AI Capabilities
Inverse Reinforcement Learning
80 posts
AI Timelines
AI Takeoff
AI Persuasion
History
Forecasting & Prediction
Dialogue (format)
Technological Forecasting
Forecasts (Specific Predictions)
Industrial Revolution
Effective Altruism
Transformative AI
Progress Studies
49
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
79
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
271
Reward is not the optimization target
TurnTrout
4mo
97
39
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
85
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
182
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
72
A shot at the diamond-alignment problem
TurnTrout
2mo
53
67
Don't design agents which exploit adversarial inputs
TurnTrout
1mo
61
129
Mechanistic anomaly detection and ELK
paulfchristiano
25d
17
42
Four usages of "loss" in AI
TurnTrout
2mo
18
87
Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility
Akash
28d
20
154
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
251
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
43
In defense of probably wrong mechanistic models
evhub
14d
10
104
Applying superintelligence without collusion
Eric Drexler
1mo
56
197
Yudkowsky and Christiano discuss "Takeoff Speeds"
Eliezer Yudkowsky
1y
181
182
What does it take to defend the world against out-of-control AGIs?
Steven Byrnes
1mo
31
13
How promising are legal avenues to restrict AI training data?
thehalliard
10d
2
315
Two-year update on my personal AI timelines
Ajeya Cotra
4mo
60
486
What 2026 looks like
Daniel Kokotajlo
1y
98
110
How might we align transformative AI if it’s developed very soon?
HoldenKarnofsky
3mo
17
198
Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain
Daniel Kokotajlo
1y
85
84
Disagreement with bio anchors that lead to shorter timelines
Marius Hobbhahn
1mo
16
37
AGI Timelines Are Mostly Not Strategically Relevant To Alignment
johnswentworth
3mo
35
118
Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon
johnswentworth
8mo
130
84
AI strategy nearcasting
HoldenKarnofsky
3mo
3
62
A review of the Bio-Anchors report
jylin04
2mo
4
6
Law-Following AI 4: Don't Rely on Vicarious Liability
Cullen_OKeefe
4mo
2