Tree of Tags

Go Back

Choose this branch

You can't go any further

meritocratic regular democratic

hot top alive

29 posts Corrigibility Instrumental Convergence Treacherous Turn Programming 2017-2019 AI Alignment Prize LessWrong Event Transcripts Satisficer Petrov Day

35 posts AI Risk

56 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

9 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

183 Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout

3y

38

15 On corrigibility and its basin

Donald Hobson

6mo

3

13 Petrov corrigibility

Stuart_Armstrong

4y

10

3 Corrigible omniscient AI capable of making clones

Kaj_Sotala

7y

0

6 An Idea For Corrigible, Recursively Improving Math Oracles

jimrandomh

7y

0

7 A first look at the hard problem of corrigibility

jessicata

7y

0

30 Do what we mean vs. do what we say

Rohin Shah

4y

14

87 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

32 Empowerment is (almost) All We Need

jacob_cannell

1mo

43

39 Solve Corrigibility Week

Logan Riggs

1y

21

22 [Linkpost] Treacherous turns in the wild

Mark Xu

1y

6

19 [AN #165]: When large models are more likely to lie

Rohin Shah

1y

0

141 Worlds Where Iterative Design Fails

johnswentworth

3mo

26

108 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

429 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

91 Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworth

4mo

49

16 Misalignment-by-default in multi-agent systems

Edouard Harris

2mo

8

61 What does it mean for an AGI to be 'safe'?

So8res

2mo

32

31 Instrumental convergence in single-agent systems

Edouard Harris

2mo

4

83 Niceness is unnatural

So8res

2mo

18

93 The alignment problem from a deep learning perspective

Richard_Ngo

4mo

13

161 AGI ruin scenarios are likely (and disjunctive)

So8res

4mo

37

72 Complex Systems for AI Safety [Pragmatic AI Safety #3]

Dan H

7mo

2

102 The Main Sources of AI Risk?

Daniel Kokotajlo

3y

25

986 AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

6mo

653

31 AI Alignment Research Overview (by Jacob Steinhardt)

Ben Pace

3y

0