Tree of Tags

Go Back

You can't go any further

Choose this branch

meritocratic regular democratic

hot top alive

17 posts Myopia

9 posts Deceptive Alignment Deception

37 Steering Behaviour: Testing for (Non-)Myopia in Language Models

Evan R. Murphy

15d

16

43 Acceptability Verification: A Research Agenda

David Udell

5mo

0

50 LCDT, A Myopic Decision Theory

adamShimi

1y

51

57 Open Problems with Myopia

Mark Xu

1y

16

26 Understanding and controlling auto-induced distributional shift

LRudL

1y

3

91 The Credit Assignment Problem

abramdemski

3y

40

55 Why GPT wants to mesa-optimize & how we might change this

John_Maxwell

2y

32

56 Arguments against myopic training

Richard_Ngo

2y

39

55 AI safety via market making

evhub

2y

45

28 Evan Hubinger on Homogeneity in Takeoff Speeds, Learned Optimization and Interpretability

Michaël Trazzi

1y

0

58 Partial Agency

abramdemski

3y

18

38 Bayesian Evolving-to-Extinction

abramdemski

2y

13

44 Towards a mechanistic understanding of corrigibility

evhub

3y

26

32 Defining Myopia

abramdemski

3y

18

87 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

118 Monitoring for deceptive alignment

evhub

3mo

7

72 How likely is deceptive alignment?

evhub

3mo

21

35 Sticky goals: a concrete experiment for understanding deceptive alignment

evhub

3mo

13

18 Precursor checking for deceptive alignment

evhub

4mo

0

30 The Speed + Simplicity Prior is probably anti-deceptive

7mo

29

23 Framings of Deceptive Alignment

peterbarnett

7mo

6

12 Training Trace Priors

Adam Jermyn

6mo

17

43 Will transparency help catch deception? Perhaps not

Matthew Barnett

3y

5