Tree of Tags

Go Back

Choose this branch

You can't go any further

meritocratic regular democratic

hot top alive

30 posts Outer Alignment Mesa-Optimization

46 posts Inner Alignment

137 Risks from Learned Optimization: Introduction

evhub

3y

42

116 List of resolved confusions about IDA

Wei_Dai

3y

18

81 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

76 "Inner Alignment Failures" Which Are Actually Outer Alignment Failures

johnswentworth

2y

38

71 An Increasingly Manipulative Newsfeed

Michaël Trazzi

3y

16

71 Prize for probable problems

paulfchristiano

4y

63

61 Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceC

4d

10

58 Weak arguments against the universal prior being malign

X4vier

4y

23

47 [AN #58] Mesa optimization: what it is, and why we should care

Rohin Shah

3y

9

44 The Steering Problem

paulfchristiano

4y

12

38 Mesa-Optimizers vs “Steered Optimizers”

Steven Byrnes

2y

7

32 The Speed + Simplicity Prior is probably anti-deceptive

7mo

29

31 Outer alignment and imitative amplification

evhub

2y

11

23 [ASoT] Some thoughts about deceptive mesaoptimization

leogao

8mo

5

165 Inner Alignment: Explain like I'm 12 Edition

Rafael Harth

2y

46

108 Demons in Imperfect Search

johnswentworth

2y

21

94 Selection Theorems: A Program For Understanding Agents

johnswentworth

1y

23

93 The Inner Alignment Problem

evhub

3y

17

92 Tessellating Hills: a toy model for demons in imperfect search

DaemonicSigil

2y

17

84 Inner and outer alignment decompose one hard problem into two extremely hard problems

TurnTrout

18d

18

84 Open question: are minimal circuits daemon-free?

paulfchristiano

4y

70

77 Discussion: Objective Robustness and Inner Alignment Terminology

jbkjr

1y

7

74 2-D Robustness

vlad_m

3y

8

73 Mesa-Search vs Mesa-Control

abramdemski

2y

45

70 Concrete experiments in inner alignment

evhub

3y

12

68 Empirical Observations of Objective Robustness Failures

jbkjr

1y

5

65 A simple environment for showing mesa misalignment

Matthew Barnett

3y

9

60 How likely is deceptive alignment?

evhub

3mo

21