Tree of Tags

Go Back

Choose this branch

You can't go any further

meritocratic regular democratic

hot top alive

29 posts Corrigibility Instrumental Convergence Treacherous Turn Programming 2017-2019 AI Alignment Prize LessWrong Event Transcripts Satisficer Petrov Day

35 posts AI Risk

183 Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More

Ben Pace

3y

60

131 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

123 Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout

3y

38

101 Announcement: AI alignment prize round 3 winners and next round

cousin_it

4y

7

90 Announcement: AI alignment prize round 4 winners

cousin_it

3y

41

82 Environmental Structure Can Cause Instrumental Convergence

TurnTrout

1y

44

80 Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability

TurnTrout

1y

8

70 A Gym Gridworld Environment for the Treacherous Turn

Michaël Trazzi

4y

9

60 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

55 Corrigibility

paulfchristiano

4y

7

51 Clarifying Power-Seeking and Instrumental Convergence

TurnTrout

3y

7

41 Can corrigibility be learned safely?

Wei_Dai

4y

115

40 Empowerment is (almost) All We Need

jacob_cannell

1mo

43

40 [Linkpost] Treacherous turns in the wild

Mark Xu

1y

6

462 AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

6mo

653

243 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

147 Worlds Where Iterative Design Fails

johnswentworth

3mo

26

135 AGI ruin scenarios are likely (and disjunctive)

So8res

4mo

37

124 AI Safety "Success Stories"

Wei_Dai

3y

27

113 Niceness is unnatural

So8res

2mo

18

108 The Main Sources of AI Risk?

Daniel Kokotajlo

3y

25

107 What can the principal-agent literature tell us about AI risk?

Alexis Carlier

2y

31

98 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

93 The alignment problem from a deep learning perspective

Richard_Ngo

4mo

13

91 And the AI would have got away with it too, if...

Stuart_Armstrong

3y

7

83 What does it mean for an AGI to be 'safe'?

So8res

2mo

32

83 The strategy-stealing assumption

paulfchristiano

3y

46

79 Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworth

4mo

49