Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

54 posts Oracle AI AI Boxing (Containment) Acausal Trade Bounties (closed) Superrationality Parables & Fables Self Fulfilling/Refuting Prophecies Values handshakes Verification Computer Security & Cryptography

26 posts Myopia Deceptive Alignment Deception

43 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

35 Side-channels: input versus output

davidad

8d

9

145 Decision theory does not imply that we get to have nice things

So8res

2mo

53

258 The Parable of Predict-O-Matic

abramdemski

3y

42

25 Training goals for large language models

Johannes_Treutlein

5mo

5

48 Superrational Agents Kelly Bet Influence!

abramdemski

1y

5

67 Results of $1,000 Oracle contest!

Stuart_Armstrong

2y

2

60 Contest: $1,000 for good questions to ask to an Oracle AI

Stuart_Armstrong

3y

156

57 Counterfactual Oracles = online supervised learning with random selection of training episodes

Wei_Dai

3y

26

72 Prize for probable problems

paulfchristiano

4y

63

36 Self-Supervised Learning and AGI Safety

Steven Byrnes

3y

9

28 Oracles: reject all deals - break superrationality, with superrationality

Stuart_Armstrong

3y

4

28 Breaking Oracles: superrationality and acausal trade

Stuart_Armstrong

3y

15

27 Analysing: Dangerous messages from future UFAI via Oracles

Stuart_Armstrong

3y

16

41 Steering Behaviour: Testing for (Non-)Myopia in Language Models

Evan R. Murphy

15d

16

85 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

119 Monitoring for deceptive alignment

evhub

3mo

7

64 How likely is deceptive alignment?

evhub

3mo

21

30 Sticky goals: a concrete experiment for understanding deceptive alignment

evhub

3mo

13

41 Acceptability Verification: A Research Agenda

David Udell

5mo

0

33 The Speed + Simplicity Prior is probably anti-deceptive

7mo

29

17 Precursor checking for deceptive alignment

evhub

4mo

0

50 LCDT, A Myopic Decision Theory

adamShimi

1y

51

57 Open Problems with Myopia

Mark Xu

1y

16

107 The Credit Assignment Problem

abramdemski

3y

40

17 Framings of Deceptive Alignment

peterbarnett

7mo

6

65 Why GPT wants to mesa-optimize & how we might change this

John_Maxwell

2y

32

67 Arguments against myopic training

Richard_Ngo

2y

39