Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

83 posts AI Risk Goodhart's Law Corrigibility Instrumental Convergence Treacherous Turn Programming 2017-2019 AI Alignment Prize Satisficer LessWrong Event Transcripts Modeling People Petrov Day

83 posts World Optimization Threat Models Existential Risk Coordination / Cooperation Academic Papers AI Safety Camp Practical Ethics & Morality Symbol Grounding Security Mindset Sharp Left Turn Fiction

986 AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

6mo

653

429 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

227 Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More

Ben Pace

3y

60

204 Goodhart Taxonomy

Scott Garrabrant

4y

33

183 Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout

3y

38

161 AGI ruin scenarios are likely (and disjunctive)

So8res

4mo

37

141 Worlds Where Iterative Design Fails

johnswentworth

3mo

26

108 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

102 The Main Sources of AI Risk?

Daniel Kokotajlo

3y

25

98 AI Safety "Success Stories"

Wei_Dai

3y

27

97 What can the principal-agent literature tell us about AI risk?

Alexis Carlier

2y

31

93 The alignment problem from a deep learning perspective

Richard_Ngo

4mo

13

91 Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworth

4mo

49

87 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

517 It Looks Like You're Trying To Take Over The World

gwern

9mo

125

416 What failure looks like

paulfchristiano

3y

49

413 How To Get Into Independent Research On Alignment/Agency

johnswentworth

1y

33

292 A central AI alignment problem: capabilities generalization, and the sharp left turn

So8res

6mo

48

284 Six Dimensions of Operational Adequacy in AGI Projects

Eliezer Yudkowsky

6mo

65

252 What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

Andrew_Critch

1y

60

240 Another (outer) alignment failure story

paulfchristiano

1y

38

207 Some AI research areas and their relevance to existential safety

Andrew_Critch

2y

40

201 Reshaping the AI Industry

Thane Ruthenis

6mo

34

189 The next decades might be wild

Marius Hobbhahn

5d

21

177 Morality is Scary

Wei_Dai

1y

125

144 An Update on Academia vs. Industry (one year into my faculty job)

David Scott Krueger (formerly: capybaralet)

3mo

18

142 Clarifying “What failure looks like”

Sam Clarke

2y

14

136 AI coordination needs clear wins

evhub

3mo

15