Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

26 posts Game Theory Center on Long-Term Risk (CLR) Risks of Astronomical Suffering (S-risks) Mechanism Design Suffering Fairness Blackmail / Extortion Group Rationality Terminology / Jargon (meta) Reading Group

20 posts Research Agendas Goal Factoring Mind Crime

44 «Boundaries», Part 3b: Alignment problems in terms of boundaries

Andrew_Critch

6d

2

125 «Boundaries», Part 1: a key missing concept from utility theory

Andrew_Critch

4mo

26

80 Threat-Resistant Bargaining Megapost: Introducing the ROSE Value

Diffractor

2mo

11

174 Unifying Bargaining Notions (1/2)

Diffractor

4mo

38

84 Unifying Bargaining Notions (2/2)

Diffractor

4mo

11

33 Announcing: Mechanism Design for AI Safety - Reading Group

Rubi J. Hudson

4mo

3

28 Sections 5 & 6: Contemporary Architectures, Humans in the Loop

JesseClifton

3y

4

29 Formal Open Problem in Decision Theory

Scott Garrabrant

4y

11

18 The Ubiquitous Converse Lawvere Problem

Scott Garrabrant

4y

0

32 Hyperreal Brouwer

Scott Garrabrant

4y

0

60 What counts as defection?

TurnTrout

2y

21

23 Sections 3 & 4: Credibility, Peaceful Bargaining Mechanisms

JesseClifton

3y

2

32 My take on higher-order game theory

Nisan

1y

6

81 "Zero Sum" is a misnomer.

abramdemski

2y

35

33 My AGI safety research—2022 review, ’23 plans

Steven Byrnes

6d

6

285 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

181 Some conceptual alignment research projects

Richard_Ngo

3mo

14

20 Distilled Representations Research Agenda

Hoagy

2mo

2

52 Resources for AI Alignment Cartography

Gyrodiot

2y

8

42 Technical AGI safety research outside AI

Richard_Ngo

3y

3

20 Why I am not currently working on the AAMLS agenda

jessicata

5y

1

75 The Learning-Theoretic AI Alignment Research Agenda

Vanessa Kosoy

4y

39

121 Thoughts on Human Models

Ramana Kumar

3y

32

52 Research Agenda v0.9: Synthesising a human's preferences into a utility function

Stuart_Armstrong

3y

25

39 Research agenda update

Steven Byrnes

1y

40

26 New safety research agenda: scalable agent alignment via reward modeling

Vika

4y

13

24 Research Agenda in reverse: what *would* a solution look like?

Stuart_Armstrong

3y

25

8 Acknowledgements & References

JesseClifton

3y

0