Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

10 posts Mesa-Optimization

12 posts Outer Alignment

184 Risks from Learned Optimization: Introduction

evhub

3y

42

75 Conditions for Mesa-Optimization

evhub

3y

48

71 Risks from Learned Optimization: Conclusion and Related Work

evhub

3y

4

58 [AN #58] Mesa optimization: what it is, and why we should care

Rohin Shah

3y

9

56 Formal Solution to the Inner Alignment Problem

michaelcohen

1y

123

52 Meta learning to gradient hack

Quintin Pope

1y

11

52 Agency As a Natural Abstraction

Thane Ruthenis

7mo

9

34 Mesa-Search vs Mesa-Control

abramdemski

2y

45

24 [ASoT] Some thoughts about deceptive mesaoptimization

leogao

8mo

5

22 Thoughts on gradient hacking

Richard_Ngo

1y

12

65 Human Mimicry Mainly Works When We’re Already Close

johnswentworth

4mo

16

56 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

51 An Increasingly Manipulative Newsfeed

Michaël Trazzi

3y

16

44 "Inner Alignment Failures" Which Are Actually Outer Alignment Failures

johnswentworth

2y

38

40 The Steering Problem

paulfchristiano

4y

12

26 If I were a well-intentioned AI... III: Extremal Goodhart

Stuart_Armstrong

2y

0

21 "Designing agent incentives to avoid reward tampering", DeepMind

gwern

3y

15

20 If I were a well-intentioned AI... II: Acting in a world

Stuart_Armstrong

2y

0

16 Outer alignment and imitative amplification

evhub

2y

11

8 Inner alignment: what are we pointing at?

lcmgcd

3mo

2

7 [ASoT] Some thoughts about imperfect world modeling

leogao

8mo

0

5 Planning capacity and daemons

lcmgcd

2mo

0