Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

12 posts Mesa-Optimization

18 posts Outer Alignment

81 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

2 How much should we worry about mesa-optimization challenges?

sudo -i

4mo

13

-3 Do mesa-optimization problems correlate with low-slack?

sudo -i

10mo

1

58 Weak arguments against the universal prior being malign

X4vier

4y

23

137 Risks from Learned Optimization: Introduction

evhub

3y

42

32 The Speed + Simplicity Prior is probably anti-deceptive

7mo

29

22 Three questions about mesa-optimizers

Eric Neyman

8mo

5

71 Prize for probable problems

paulfchristiano

4y

63

47 [AN #58] Mesa optimization: what it is, and why we should care

Rohin Shah

3y

9

0 Is evolutionary influence the mesa objective that we're interested in?

David Johnston

7mo

2

23 [ASoT] Some thoughts about deceptive mesaoptimization

leogao

8mo

5

10 Mesa-utility functions might not be purely proxy goals

Thomas Kwa

8mo

17

61 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

-1 Don't you think RLHF solves outer alignment?

Raphaël S

1mo

19

7 Alignment as Game Design

Shoshannah Tekofsky

5mo

7

6 Inner alignment: what are we pointing at?

lcmgcd

3mo

2

38 Mesa-Optimizers vs “Steered Optimizers”

Steven Byrnes

2y

7

31 Outer alignment and imitative amplification

evhub

2y

11

44 The Steering Problem

paulfchristiano

4y

12

19 Is the Star Trek Federation really incapable of building AI?

Kaj_Sotala

4y

4

-2 Alignment via manually implementing the utility function

Chantiel

1y

6

71 An Increasingly Manipulative Newsfeed

Michaël Trazzi

3y

16

116 List of resolved confusions about IDA

Wei_Dai

3y

18

76 "Inner Alignment Failures" Which Are Actually Outer Alignment Failures

johnswentworth

2y

38

-2 The Disastrously Confident And Inaccurate AI

Sharat Jacob Jacob

1mo

0

19 If I were a well-intentioned AI... II: Acting in a world

Stuart_Armstrong

2y

0