Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

64 posts AI Risk Corrigibility Instrumental Convergence Treacherous Turn Programming 2017-2019 AI Alignment Prize LessWrong Event Transcripts Satisficer Petrov Day

19 posts Goodhart's Law Modeling People

58 You can still fetch the coffee today if you're dead tomorrow

davidad

11d

15

144 Worlds Where Iterative Design Fails

johnswentworth

3mo

26

103 AI will change the world, but won’t take it over by playing “3-dimensional chess”.

boazbarak

28d

86

336 Counterarguments to the basic AI x-risk case

KatjaGrace

2mo

122

13 Corrigibility Via Thought-Process Deference

Thane Ruthenis

26d

5

85 Oversight Misses 100% of Thoughts The AI Does Not Think

johnswentworth

4mo

49

17 Misalignment-by-default in multi-agent systems

Edouard Harris

2mo

8

153 Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout

3y

38

72 What does it mean for an AGI to be 'safe'?

So8res

2mo

32

27 Instrumental convergence in single-agent systems

Edouard Harris

2mo

4

98 Niceness is unnatural

So8res

2mo

18

93 The alignment problem from a deep learning perspective

Richard_Ngo

4mo

13

148 AGI ruin scenarios are likely (and disjunctive)

So8res

4mo

37

48 Complex Systems for AI Safety [Pragmatic AI Safety #3]

Dan H

7mo

2

14 Reducing Goodhart: Announcement, Executive Summary

Charlie Steiner

4mo

0

30 Re-introducing Selection vs Control for Optimization (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 1)

Davidmanheim

3y

5

7 The Three Levels of Goodhart's Curse

Scott Garrabrant

4y

0

27 Competent Preferences

Charlie Steiner

1y

2

43 Specification gaming examples in AI

Vika

4y

9

26 What does Optimization Mean, Again? (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 2)

Davidmanheim

3y

7

180 Goodhart Taxonomy

Scott Garrabrant

4y

33

24 Goodhart's Curse and Limitations on AI Alignment

Gordon Seidoh Worley

3y

18

44 Defeating Goodhart and the "closest unblocked strategy" problem

Stuart_Armstrong

3y

15

40 Introduction to Reducing Goodhart

Charlie Steiner

1y

10

44 Does Bayes Beat Goodhart?

abramdemski

3y

26

22 Non-Adversarial Goodhart and AI Risks

Davidmanheim

4y

11

20 Models Modeling Models

Charlie Steiner

1y

5

43 Bounding Goodhart's Law

eric_langlois

4y

2