Go Back
Choose this branch
You can't go any further
meritocratic
regular
democratic
hot
top
alive
29 posts
Corrigibility
Instrumental Convergence
Treacherous Turn
Programming
2017-2019 AI Alignment Prize
LessWrong Event Transcripts
Satisficer
Petrov Day
35 posts
AI Risk
60
You can still fetch the coffee today if you're dead tomorrow
davidad
11d
15
17
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
40
Empowerment is (almost) All We Need
jacob_cannell
1mo
43
131
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
80
Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability
TurnTrout
1y
8
82
Environmental Structure Can Cause Instrumental Convergence
TurnTrout
1y
44
183
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More
Ben Pace
3y
60
17
On corrigibility and its basin
Donald Hobson
6mo
3
39
Solve Corrigibility Week
Logan Riggs
1y
21
123
Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout
3y
38
26
Formalizing Policy-Modification Corrigibility
TurnTrout
1y
6
40
[Linkpost] Treacherous turns in the wild
Mark Xu
1y
6
27
[AN #165]: When large models are more likely to lie
Rohin Shah
1y
0
32
MDP models are determined by the agent architecture and the environmental dynamics
TurnTrout
1y
34
98
AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak
28d
86
243
Counterarguments to the basic AI x-risk case
KatjaGrace
2mo
122
462
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
6mo
653
113
Niceness is unnatural
So8res
2mo
18
147
Worlds Where Iterative Design Fails
johnswentworth
3mo
26
83
What does it mean for an AGI to be 'safe'?
So8res
2mo
32
135
AGI ruin scenarios are likely (and disjunctive)
So8res
4mo
37
65
Eli's review of "Is power-seeking AI an existential risk?"
elifland
2mo
0
93
The alignment problem from a deep learning perspective
Richard_Ngo
4mo
13
79
Oversight Misses 100% of Thoughts The AI Does Not Think
johnswentworth
4mo
49
23
Instrumental convergence in single-agent systems
Edouard Harris
2mo
4
18
Misalignment-by-default in multi-agent systems
Edouard Harris
2mo
8
11
Instrumental convergence: scale and physical interactions
Edouard Harris
2mo
0
66
Some abstract, non-technical reasons to be non-maximally-pessimistic about AI alignment
Rob Bensinger
1y
37