Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

10 posts Mesa-Optimization

12 posts Outer Alignment

52 Agency As a Natural Abstraction

Thane Ruthenis

7mo

9

184 Risks from Learned Optimization: Introduction

evhub

3y

42

52 Meta learning to gradient hack

Quintin Pope

1y

11

24 [ASoT] Some thoughts about deceptive mesaoptimization

leogao

8mo

5

56 Formal Solution to the Inner Alignment Problem

michaelcohen

1y

123

75 Conditions for Mesa-Optimization

evhub

3y

48

71 Risks from Learned Optimization: Conclusion and Related Work

evhub

3y

4

22 Thoughts on gradient hacking

Richard_Ngo

1y

12

58 [AN #58] Mesa optimization: what it is, and why we should care

Rohin Shah

3y

9

34 Mesa-Search vs Mesa-Control

abramdemski

2y

45

56 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

65 Human Mimicry Mainly Works When We’re Already Close

johnswentworth

4mo

16

8 Inner alignment: what are we pointing at?

lcmgcd

3mo

2

5 Planning capacity and daemons

lcmgcd

2mo

0

44 "Inner Alignment Failures" Which Are Actually Outer Alignment Failures

johnswentworth

2y

38

51 An Increasingly Manipulative Newsfeed

Michaël Trazzi

3y

16

7 [ASoT] Some thoughts about imperfect world modeling

leogao

8mo

0

26 If I were a well-intentioned AI... III: Extremal Goodhart

Stuart_Armstrong

2y

0

40 The Steering Problem

paulfchristiano

4y

12

20 If I were a well-intentioned AI... II: Acting in a world

Stuart_Armstrong

2y

0

21 "Designing agent incentives to avoid reward tampering", DeepMind

gwern

3y

15

16 Outer alignment and imitative amplification

evhub

2y

11