Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

83 posts AI Risk Goodhart's Law Corrigibility Instrumental Convergence Treacherous Turn Programming 2017-2019 AI Alignment Prize Satisficer LessWrong Event Transcripts Modeling People Petrov Day

83 posts World Optimization Threat Models Existential Risk Coordination / Cooperation Academic Papers AI Safety Camp Practical Ethics & Morality Symbol Grounding Security Mindset Sharp Left Turn Fiction

56 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

141 Worlds Where Iterative Design Fails

johnswentworth

3mo

26

108 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

429 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

9 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

91 Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworth

4mo

49

16 Misalignment-by-default in multi-agent systems

Edouard Harris

2mo

8

183 Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout

3y

38

61 What does it mean for an AGI to be 'safe'?

So8res

2mo

32

31 Instrumental convergence in single-agent systems

Edouard Harris

2mo

4

83 Niceness is unnatural

So8res

2mo

18

93 The alignment problem from a deep learning perspective

Richard_Ngo

4mo

13

161 AGI ruin scenarios are likely (and disjunctive)

So8res

4mo

37

72 Complex Systems for AI Safety [Pragmatic AI Safety #3]

Dan H

7mo

2

189 The next decades might be wild

Marius Hobbhahn

5d

21

82 Thoughts on AGI organizations and capabilities work

Rob Bensinger

13d

17

44 AI X-risk >35% mostly based on a recent peer-reviewed argument

michaelcohen

1mo

31

69 Deconfusing Direct vs Amortised Optimization

beren

18d

6

67 We may be able to see sharp left turns coming

Ethan Perez

3mo

26

80 Don't leave your fingerprints on the future

So8res

2mo

32

30 Refining the Sharp Left Turn threat model, part 2: applying alignment techniques

Vika

25d

4

144 An Update on Academia vs. Industry (one year into my faculty job)

David Scott Krueger (formerly: capybaralet)

3mo

18

292 A central AI alignment problem: capabilities generalization, and the sharp left turn

So8res

6mo

48

45 The Dumbest Possible Gets There First

Artaxerxes

4mo

7

136 AI coordination needs clear wins

evhub

3mo

15

23 Concrete Advice for Forming Inside Views on AI Safety

Neel Nanda

4mo

6

83 Refining the Sharp Left Turn threat model, part 1: claims and mechanisms

Vika

4mo

3

58 A survey of tool use and workflows in alignment research

Logan Riggs

9mo

5