Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

26 posts Game Theory Center on Long-Term Risk (CLR) Risks of Astronomical Suffering (S-risks) Mechanism Design Suffering Fairness Blackmail / Extortion Group Rationality Terminology / Jargon (meta) Reading Group

20 posts Research Agendas Goal Factoring Mind Crime

54 «Boundaries», Part 3b: Alignment problems in terms of boundaries

Andrew_Critch

6d

2

155 «Boundaries», Part 1: a key missing concept from utility theory

Andrew_Critch

4mo

26

98 Threat-Resistant Bargaining Megapost: Introducing the ROSE Value

Diffractor

2mo

11

204 Unifying Bargaining Notions (1/2)

Diffractor

4mo

38

120 Unifying Bargaining Notions (2/2)

Diffractor

4mo

11

1 Announcing: Mechanism Design for AI Safety - Reading Group

Rubi J. Hudson

4mo

3

26 Sections 5 & 6: Contemporary Architectures, Humans in the Loop

JesseClifton

3y

4

41 Formal Open Problem in Decision Theory

Scott Garrabrant

4y

11

24 The Ubiquitous Converse Lawvere Problem

Scott Garrabrant

4y

0

28 Hyperreal Brouwer

Scott Garrabrant

4y

0

102 What counts as defection?

TurnTrout

2y

21

15 Sections 3 & 4: Credibility, Peaceful Bargaining Mechanisms

JesseClifton

3y

2

44 My take on higher-order game theory

Nisan

1y

6

131 "Zero Sum" is a misnomer.

abramdemski

2y

35

35 My AGI safety research—2022 review, ’23 plans

Steven Byrnes

6d

6

231 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

155 Some conceptual alignment research projects

Richard_Ngo

3mo

14

10 Distilled Representations Research Agenda

Hoagy

2mo

2

38 Resources for AI Alignment Cartography

Gyrodiot

2y

8

44 Technical AGI safety research outside AI

Richard_Ngo

3y

3

36 Why I am not currently working on the AAMLS agenda

jessicata

5y

1

77 The Learning-Theoretic AI Alignment Research Agenda

Vanessa Kosoy

4y

39

127 Thoughts on Human Models

Ramana Kumar

3y

32

82 Research Agenda v0.9: Synthesising a human's preferences into a utility function

Stuart_Armstrong

3y

25

69 Research agenda update

Steven Byrnes

1y

40

42 New safety research agenda: scalable agent alignment via reward modeling

Vika

4y

13

44 Research Agenda in reverse: what *would* a solution look like?

Stuart_Armstrong

3y

25

4 Acknowledgements & References

JesseClifton

3y

0