Tree of Tags

Go Back

You can't go any further

Choose this branch

meritocratic regular democratic

hot top alive

17 posts Myopia

9 posts Deceptive Alignment Deception

107 The Credit Assignment Problem

abramdemski

3y

40

67 Arguments against myopic training

Richard_Ngo

2y

39

65 Why GPT wants to mesa-optimize & how we might change this

John_Maxwell

2y

32

63 Partial Agency

abramdemski

3y

18

57 Open Problems with Myopia

Mark Xu

1y

16

57 AI safety via market making

evhub

2y

45

50 LCDT, A Myopic Decision Theory

adamShimi

1y

51

49 Bayesian Evolving-to-Extinction

abramdemski

2y

13

44 Towards a mechanistic understanding of corrigibility

evhub

3y

26

41 Steering Behaviour: Testing for (Non-)Myopia in Language Models

Evan R. Murphy

15d

16

41 Acceptability Verification: A Research Agenda

David Udell

5mo

0

37 Defining Myopia

abramdemski

3y

18

36 Evan Hubinger on Homogeneity in Takeoff Speeds, Learned Optimization and Interpretability

Michaël Trazzi

1y

0

31 Random Thoughts on Predict-O-Matic

abramdemski

3y

3

119 Monitoring for deceptive alignment

evhub

3mo

7

85 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

64 How likely is deceptive alignment?

evhub

3mo

21

47 Will transparency help catch deception? Perhaps not

Matthew Barnett

3y

5

33 The Speed + Simplicity Prior is probably anti-deceptive

7mo

29

30 Sticky goals: a concrete experiment for understanding deceptive alignment

evhub

3mo

13

17 Framings of Deceptive Alignment

peterbarnett

7mo

6

17 Precursor checking for deceptive alignment

evhub

4mo

0

8 Training Trace Priors

Adam Jermyn

6mo

17