Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

64 posts AI Risk Corrigibility Instrumental Convergence Treacherous Turn Programming 2017-2019 AI Alignment Prize LessWrong Event Transcripts Satisficer Petrov Day

19 posts Goodhart's Law Modeling People

724 AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

6mo

653

336 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

205 Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More

Ben Pace

3y

60

153 Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout

3y

38

148 AGI ruin scenarios are likely (and disjunctive)

So8res

4mo

37

144 Worlds Where Iterative Design Fails

johnswentworth

3mo

26

111 AI Safety "Success Stories"

Wei_Dai

3y

27

109 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

105 The Main Sources of AI Risk?

Daniel Kokotajlo

3y

25

103 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

102 What can the principal-agent literature tell us about AI risk?

Alexis Carlier

2y

31

98 Niceness is unnatural

So8res

2mo

18

93 The alignment problem from a deep learning perspective

Richard_Ngo

4mo

13

93 Announcement: AI alignment prize round 3 winners and next round

cousin_it

4y

7

180 Goodhart Taxonomy

Scott Garrabrant

4y

33

70 Classifying specification problems as variants of Goodhart's Law

Vika

3y

5

68 How does Gradient Descent Interact with Goodhart?

Scott Garrabrant

3y

19

44 Defeating Goodhart and the "closest unblocked strategy" problem

Stuart_Armstrong

3y

15

44 Does Bayes Beat Goodhart?

abramdemski

3y

26

43 Specification gaming examples in AI

Vika

4y

9

43 Bounding Goodhart's Law

eric_langlois

4y

2

42 Using expected utility for Good(hart)

Stuart_Armstrong

4y

5

40 Introduction to Reducing Goodhart

Charlie Steiner

1y

10

30 Re-introducing Selection vs Control for Optimization (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 1)

Davidmanheim

3y

5

28 All I know is Goodhart

Stuart_Armstrong

3y

23

27 Competent Preferences

Charlie Steiner

1y

2

26 What does Optimization Mean, Again? (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 2)

Davidmanheim

3y

7

24 Goodhart's Curse and Limitations on AI Alignment

Gordon Seidoh Worley

3y

18