Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

46 posts Research Agendas Game Theory Center on Long-Term Risk (CLR) Risks of Astronomical Suffering (S-risks) Mechanism Design Suffering Fairness Blackmail / Extortion Group Rationality Terminology / Jargon (meta) Reading Group Mind Crime

65 posts Iterated Amplification Debate (AI safety technique) Factored Cognition Humans Consulting HCH Ought Adversarial Collaboration Delegation

231 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

204 Unifying Bargaining Notions (1/2)

Diffractor

4mo

38

155 Some conceptual alignment research projects

Richard_Ngo

3mo

14

155 «Boundaries», Part 1: a key missing concept from utility theory

Andrew_Critch

4mo

26

131 "Zero Sum" is a misnomer.

abramdemski

2y

35

127 Thoughts on Human Models

Ramana Kumar

3y

32

122 Our take on CHAI’s research agenda in under 1500 words

Alex Flint

2y

19

120 Unifying Bargaining Notions (2/2)

Diffractor

4mo

11

102 What counts as defection?

TurnTrout

2y

21

101 The Commitment Races problem

Daniel Kokotajlo

3y

39

98 Threat-Resistant Bargaining Megapost: Introducing the ROSE Value

Diffractor

2mo

11

86 When Hindsight Isn't 20/20: Incentive Design With Imperfect Credit Allocation

johnswentworth

2y

24

82 Research Agenda v0.9: Synthesising a human's preferences into a utility function

Stuart_Armstrong

3y

25

79 Siren worlds and the perils of over-optimised search

Stuart_Armstrong

8y

417

139 Debate update: Obfuscated arguments problem

Beth Barnes

1y

21

124 My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda

Chi Nguyen

2y

21

122 Supervise Process, not Outcomes

stuhlmueller

8mo

8

118 Paul's research agenda FAQ

zhukeepa

4y

73

106 Imitative Generalisation (AKA 'Learning the Prior')

Beth Barnes

1y

14

97 Writeup: Progress on AI Safety via Debate

Beth Barnes

2y

18

88 Model splintering: moving from one imperfect model to another

Stuart_Armstrong

2y

10

85 Rant on Problem Factorization for Alignment

johnswentworth

4mo

48

81 Ought: why it matters and ways to help

paulfchristiano

3y

7

80 Why I'm excited about Debate

Richard_Ngo

1y

12

70 Garrabrant and Shah on human modeling in AGI

Rob Bensinger

1y

10

66 A guide to Iterated Amplification & Debate

Rafael Harth

2y

10

65 Vaniver's View on Factored Cognition

Vaniver

3y

4

65 Looking for adversarial collaborators to test our Debate protocol

Beth Barnes

2y

5