Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

64 posts AI Risk Corrigibility Instrumental Convergence Treacherous Turn Programming 2017-2019 AI Alignment Prize LessWrong Event Transcripts Satisficer Petrov Day

19 posts Goodhart's Law Modeling People

60 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

98 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

243 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

462 AGI Ruin: A List of Lethalities

Eliezer Yudkowsky

6mo

653

113 Niceness is unnatural

So8res

2mo

18

147 Worlds Where Iterative Design Fails

johnswentworth

3mo

26

83 What does it mean for an AGI to be 'safe'?

So8res

2mo

32

135 AGI ruin scenarios are likely (and disjunctive)

So8res

4mo

37

65 Eli's review of "Is power-seeking AI an existential risk?"

elifland

2mo

0

17 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

40 Empowerment is (almost) All We Need

jacob_cannell

1mo

43

93 The alignment problem from a deep learning perspective

Richard_Ngo

4mo

13

131 Let's See You Write That Corrigibility Tag

Eliezer Yudkowsky

6mo

67

79 Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworth

4mo

49

18 Reducing Goodhart: Announcement, Executive Summary

Charlie Steiner

4mo

0

49 Introduction to Reducing Goodhart

Charlie Steiner

1y

10

14 Proxy misspecification and the capabilities vs. value learning race

Sam Marks

7mo

1

156 Goodhart Taxonomy

Scott Garrabrant

4y

33

32 Competent Preferences

Charlie Steiner

1y

2

79 Classifying specification problems as variants of Goodhart's Law

Vika

3y

5

72 How does Gradient Descent Interact with Goodhart?

Scott Garrabrant

3y

19

17 Models Modeling Models

Charlie Steiner

1y

5

55 Does Bayes Beat Goodhart?

abramdemski

3y

26

51 Defeating Goodhart and the "closest unblocked strategy" problem

Stuart_Armstrong

3y

15

53 Using expected utility for Good(hart)

Stuart_Armstrong

4y

5

51 Bounding Goodhart's Law

eric_langlois

4y

2

38 Re-introducing Selection vs Control for Optimization (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 1)

Davidmanheim

3y

5

33 All I know is Goodhart

Stuart_Armstrong

3y

23