Tree of Tags

Go Back

Choose this branch

You can't go any further

meritocratic regular democratic

hot top alive

29 posts Corrigibility Instrumental Convergence Treacherous Turn Programming 2017-2019 AI Alignment Prize LessWrong Event Transcripts Satisficer Petrov Day

35 posts AI Risk

60 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

17 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

40 Empowerment is (almost) All We Need

jacob_cannell

1mo

43

131 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

80 Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability

TurnTrout

1y

8

82 Environmental Structure Can Cause Instrumental Convergence

TurnTrout

1y

44

183 Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More

Ben Pace

3y

60

17 On corrigibility and its basin

Donald Hobson

6mo

3

39 Solve Corrigibility Week

Logan Riggs

1y

21

123 Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout

3y

38

26 Formalizing Policy-Modification Corrigibility

TurnTrout

1y

6

40 [Linkpost] Treacherous turns in the wild

Mark Xu

1y

6

27 [AN #165]: When large models are more likely to lie

Rohin Shah

1y

0

32 MDP models are determined by the agent architecture and the environmental dynamics

TurnTrout

1y

34

98 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

243 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

462 AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

6mo

653

113 Niceness is unnatural

So8res

2mo

18

147 Worlds Where Iterative Design Fails

johnswentworth

3mo

26

83 What does it mean for an AGI to be 'safe'?

So8res

2mo

32

135 AGI ruin scenarios are likely (and disjunctive)

So8res

4mo

37

65 Eli's review of "Is power-seeking AI an existential risk?"

elifland

2mo

0

93 The alignment problem from a deep learning perspective

Richard_Ngo

4mo

13

79 Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworth

4mo

49

23 Instrumental convergence in single-agent systems

Edouard Harris

2mo

4

18 Misalignment-by-default in multi-agent systems

Edouard Harris

2mo

8

11 Instrumental convergence: scale and physical interactions

Edouard Harris

2mo

0

66 Some abstract, non-technical reasons to be non-maximally-pessimistic about AI alignment

Rob Bensinger

1y

37