Go Back
Choose this branch
You can't go any further
meritocratic
regular
democratic
hot
top
alive
29 posts
Corrigibility
Instrumental Convergence
Treacherous Turn
Programming
2017-2019 AI Alignment Prize
LessWrong Event Transcripts
Satisficer
Petrov Day
35 posts
AI Risk
56
You can still fetch the coffee today if you're dead tomorrow
davidad
11d
15
9
Corrigibility Via Thought-Process Deference
Thane Ruthenis
26d
5
183
Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout
3y
38
15
On corrigibility and its basin
Donald Hobson
6mo
3
13
Petrov corrigibility
Stuart_Armstrong
4y
10
3
Corrigible omniscient AI capable of making clones
Kaj_Sotala
7y
0
6
An Idea For Corrigible, Recursively Improving Math Oracles
jimrandomh
7y
0
7
A first look at the hard problem of corrigibility
jessicata
7y
0
30
Do what we mean vs. do what we say
Rohin Shah
4y
14
87
Let's See You Write That Corrigibility Tag
Eliezer Yudkowsky
6mo
67
32
Empowerment is (almost) All We Need
jacob_cannell
1mo
43
39
Solve Corrigibility Week
Logan Riggs
1y
21
22
[Linkpost] Treacherous turns in the wild
Mark Xu
1y
6
19
[AN #165]: When large models are more likely to lie
Rohin Shah
1y
0
141
Worlds Where Iterative Design Fails
johnswentworth
3mo
26
108
AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak
28d
86
429
Counterarguments to the basic AI x-risk case
KatjaGrace
2mo
122
91
Oversight Misses 100% of Thoughts The AI Does Not Think
johnswentworth
4mo
49
16
Misalignment-by-default in multi-agent systems
Edouard Harris
2mo
8
61
What does it mean for an AGI to be 'safe'?
So8res
2mo
32
31
Instrumental convergence in single-agent systems
Edouard Harris
2mo
4
83
Niceness is unnatural
So8res
2mo
18
93
The alignment problem from a deep learning perspective
Richard_Ngo
4mo
13
161
AGI ruin scenarios are likely (and disjunctive)
So8res
4mo
37
72
Complex Systems for AI Safety [Pragmatic AI Safety #3]
Dan H
7mo
2
102
The Main Sources of AI Risk?
Daniel Kokotajlo
3y
25
986
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
6mo
653
31
AI Alignment Research Overview (by Jacob Steinhardt)
Ben Pace
3y
0