Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

54 posts Oracle AI AI Boxing (Containment) Acausal Trade Bounties (closed) Superrationality Parables & Fables Self Fulfilling/Refuting Prophecies Values handshakes Verification Computer Security & Cryptography

26 posts Myopia Deceptive Alignment Deception

67 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

35 Side-channels: input versus output

davidad

8d

9

139 Decision theory does not imply that we get to have nice things

So8res

2mo

53

324 The Parable of Predict-O-Matic

abramdemski

3y

42

27 Training goals for large language models

Johannes_Treutlein

5mo

5

34 Superrational Agents Kelly Bet Influence!

abramdemski

1y

5

49 Results of $1,000 Oracle contest!

Stuart_Armstrong

2y

2

54 Contest: $1,000 for good questions to ask to an Oracle AI

Stuart_Armstrong

3y

156

37 Counterfactual Oracles = online supervised learning with random selection of training episodes

Wei_Dai

3y

26

48 Prize for probable problems

paulfchristiano

4y

63

22 Breaking Oracles: superrationality and acausal trade

Stuart_Armstrong

3y

15

22 Self-Supervised Learning and AGI Safety

Steven Byrnes

3y

9

17 Analysing: Dangerous messages from future UFAI via Oracles

Stuart_Armstrong

3y

16

17 Oracles, sequence predictors, and self-confirming predictions

Stuart_Armstrong

3y

0

33 Steering Behaviour: Testing for (Non-)Myopia in Language Models

Evan R. Murphy

15d

16

89 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

117 Monitoring for deceptive alignment

evhub

3mo

7

80 How likely is deceptive alignment?

evhub

3mo

21

40 Sticky goals: a concrete experiment for understanding deceptive alignment

evhub

3mo

13

45 Acceptability Verification: A Research Agenda

David Udell

5mo

0

19 Precursor checking for deceptive alignment

evhub

4mo

0

29 Framings of Deceptive Alignment

peterbarnett

7mo

6

27 The Speed + Simplicity Prior is probably anti-deceptive

7mo

29

50 LCDT, A Myopic Decision Theory

adamShimi

1y

51

16 Training Trace Priors

Adam Jermyn

6mo

17

31 Understanding and controlling auto-induced distributional shift

LRudL

1y

3

57 Open Problems with Myopia

Mark Xu

1y

16

75 The Credit Assignment Problem

abramdemski

3y

40