Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

54 posts Oracle AI AI Boxing (Containment) Acausal Trade Bounties (closed) Superrationality Parables & Fables Self Fulfilling/Refuting Prophecies Values handshakes Verification Computer Security & Cryptography

26 posts Myopia Deceptive Alignment Deception

258 The Parable of Predict-O-Matic

abramdemski

3y

42

145 Decision theory does not imply that we get to have nice things

So8res

2mo

53

72 Prize for probable problems

paulfchristiano

4y

63

67 Results of $1,000 Oracle contest!

Stuart_Armstrong

2y

2

65 Cryptographic Boxes for Unfriendly AI

paulfchristiano

12y

162

60 Contest: $1,000 for good questions to ask to an Oracle AI

Stuart_Armstrong

3y

156

57 Counterfactual Oracles = online supervised learning with random selection of training episodes

Wei_Dai

3y

26

48 Superrational Agents Kelly Bet Influence!

abramdemski

1y

5

43 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

36 Self-Supervised Learning and AGI Safety

Steven Byrnes

3y

9

35 Side-channels: input versus output

davidad

8d

9

33 Safely and usefully spectating on AIs optimizing over toy worlds

AlexMennen

4y

16

32 Reflective oracles as a solution to the converse Lawvere problem

SamEisenstat

4y

0

29 Superrationality in arbitrary games

Vanessa Kosoy

7y

0

119 Monitoring for deceptive alignment

evhub

3mo

7

107 The Credit Assignment Problem

abramdemski

3y

40

85 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

67 Arguments against myopic training

Richard_Ngo

2y

39

65 Why GPT wants to mesa-optimize & how we might change this

John_Maxwell

2y

32

64 How likely is deceptive alignment?

evhub

3mo

21

63 Partial Agency

abramdemski

3y

18

57 Open Problems with Myopia

Mark Xu

1y

16

57 AI safety via market making

evhub

2y

45

50 LCDT, A Myopic Decision Theory

adamShimi

1y

51

49 Bayesian Evolving-to-Extinction

abramdemski

2y

13

47 Will transparency help catch deception? Perhaps not

Matthew Barnett

3y

5

44 Towards a mechanistic understanding of corrigibility

evhub

3y

26

41 Steering Behaviour: Testing for (Non-)Myopia in Language Models

Evan R. Murphy

15d

16