Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

54 posts Oracle AI AI Boxing (Containment) Acausal Trade Bounties (closed) Superrationality Parables & Fables Self Fulfilling/Refuting Prophecies Values handshakes Verification Computer Security & Cryptography

26 posts Myopia Deceptive Alignment Deception

291 The Parable of Predict-O-Matic

abramdemski

3y

42

142 Decision theory does not imply that we get to have nice things

So8res

2mo

53

60 Prize for probable problems

paulfchristiano

4y

63

58 Results of $1,000 Oracle contest!

Stuart_Armstrong

2y

2

57 Contest: $1,000 for good questions to ask to an Oracle AI

Stuart_Armstrong

3y

156

55 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

54 Cryptographic Boxes for Unfriendly AI

paulfchristiano

12y

162

47 Counterfactual Oracles = online supervised learning with random selection of training episodes

Wei_Dai

3y

26

41 Superrational Agents Kelly Bet Influence!

abramdemski

1y

5

35 Side-channels: input versus output

davidad

8d

9

29 Self-Supervised Learning and AGI Safety

Steven Byrnes

3y

9

26 Training goals for large language models

Johannes_Treutlein

5mo

5

25 Breaking Oracles: superrationality and acausal trade

Stuart_Armstrong

3y

15

24 Reflective oracles as a solution to the converse Lawvere problem

SamEisenstat

4y

0

118 Monitoring for deceptive alignment

evhub

3mo

7

91 The Credit Assignment Problem

abramdemski

3y

40

87 Trying to Make a Treacherous Mesa-Optimizer

MadHatter

1mo

13

72 How likely is deceptive alignment?

evhub

3mo

21

58 Partial Agency

abramdemski

3y

18

57 Open Problems with Myopia

Mark Xu

1y

16

56 Arguments against myopic training

Richard_Ngo

2y

39

55 Why GPT wants to mesa-optimize & how we might change this

John_Maxwell

2y

32

55 AI safety via market making

evhub

2y

45

50 LCDT, A Myopic Decision Theory

adamShimi

1y

51

44 Towards a mechanistic understanding of corrigibility

evhub

3y

26

43 Acceptability Verification: A Research Agenda

David Udell

5mo

0

43 Will transparency help catch deception? Perhaps not

Matthew Barnett

3y

5

38 Bayesian Evolving-to-Extinction

abramdemski

2y

13