Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
83 posts
AI Risk
Goodhart's Law
Corrigibility
Instrumental Convergence
Treacherous Turn
Programming
2017-2019 AI Alignment Prize
Satisficer
LessWrong Event Transcripts
Modeling People
Petrov Day
83 posts
World Optimization
Threat Models
Existential Risk
Coordination / Cooperation
Academic Papers
AI Safety Camp
Practical
Ethics & Morality
Symbol Grounding
Security Mindset
Sharp Left Turn
Fiction
60
You can still fetch the coffee today if you're dead tomorrow
davidad
11d
15
147
Worlds Where Iterative Design Fails
johnswentworth
3mo
26
98
AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak
28d
86
243
Counterarguments to the basic AI x-risk case
KatjaGrace
2mo
122
17
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
79
Oversight Misses 100% of Thoughts The AI Does Not Think
johnswentworth
4mo
49
18
Misalignment-by-default in multi-agent systems
Edouard Harris
2mo
8
123
Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout
3y
38
83
What does it mean for an AGI to be 'safe'?
So8res
2mo
32
23
Instrumental convergence in single-agent systems
Edouard Harris
2mo
4
113
Niceness is unnatural
So8res
2mo
18
93
The alignment problem from a deep learning perspective
Richard_Ngo
4mo
13
135
AGI ruin scenarios are likely (and disjunctive)
So8res
4mo
37
24
Complex Systems for AI Safety [Pragmatic AI Safety #3]
Dan H
7mo
2
121
The next decades might be wild
Marius Hobbhahn
5d
21
106
Thoughts on AGI organizations and capabilities work
Rob Bensinger
13d
17
28
AI X-risk >35% mostly based on a recent peer-reviewed argument
michaelcohen
1mo
31
27
Deconfusing Direct vs Amortised Optimization
beren
18d
6
33
We may be able to see sharp left turns coming
Ethan Perez
3mo
26
106
Don't leave your fingerprints on the future
So8res
2mo
32
42
Refining the Sharp Left Turn threat model, part 2: applying alignment techniques
Vika
25d
4
92
An Update on Academia vs. Industry (one year into my faculty job)
David Scott Krueger (formerly: capybaralet)
3mo
18
214
A central AI alignment problem: capabilities generalization, and the sharp left turn
So8res
6mo
48
25
The Dumbest Possible Gets There First
Artaxerxes
4mo
7
132
AI coordination needs clear wins
evhub
3mo
15
13
Concrete Advice for Forming Inside Views on AI Safety
Neel Nanda
4mo
6
59
Refining the Sharp Left Turn threat model, part 1: claims and mechanisms
Vika
4mo
3
28
A survey of tool use and workflows in alignment research
Logan Riggs
9mo
5