Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

46 posts Research Agendas Game Theory Center on Long-Term Risk (CLR) Risks of Astronomical Suffering (S-risks) Mechanism Design Suffering Fairness Blackmail / Extortion Group Rationality Terminology / Jargon (meta) Reading Group Mind Crime

65 posts Iterated Amplification Debate (AI safety technique) Factored Cognition Humans Consulting HCH Ought Adversarial Collaboration Delegation

49 «Boundaries», Part 3b: Alignment problems in terms of boundaries

Andrew_Critch

6d

2

34 My AGI safety research—2022 review, ’23 plans

Steven Byrnes

6d

6

258 On how various plans miss the hard bits of the alignment challenge

So8res

5mo

81

168 Some conceptual alignment research projects

Richard_Ngo

3mo

14

189 Unifying Bargaining Notions (1/2)

Diffractor

4mo

38

89 Threat-Resistant Bargaining Megapost: Introducing the ROSE Value

Diffractor

2mo

11

16 Theories of impact for Science of Deep Learning

Marius Hobbhahn

19d

0

140 «Boundaries», Part 1: a key missing concept from utility theory

Andrew_Critch

4mo

26

102 Unifying Bargaining Notions (2/2)

Diffractor

4mo

11

15 Distilled Representations Research Agenda

Hoagy

2mo

2

17 Announcing: Mechanism Design for AI Safety - Reading Group

Rubi J. Hudson

4mo

3

106 "Zero Sum" is a misnomer.

abramdemski

2y

35

112 Our take on CHAI’s research agenda in under 1500 words

Alex Flint

2y

19

54 Research agenda update

Steven Byrnes

1y

40

36 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

47 Notes on OpenAI’s alignment plan

Alex Flint

12d

5

47 A Library and Tutorial for Factored Cognition with Language Models

stuhlmueller

2mo

0

73 Rant on Problem Factorization for Alignment

johnswentworth

4mo

48

118 Supervise Process, not Outcomes

stuhlmueller

8mo

8

35 Ought will host a factored cognition “Lab Meeting”

jungofthewon

3mo

1

42 A Small Negative Result on Debate

Sam Bowman

8mo

11

125 Debate update: Obfuscated arguments problem

Beth Barnes

1y

21

119 My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda

Chi Nguyen

2y

21

92 Imitative Generalisation (AKA 'Learning the Prior')

Beth Barnes

1y

14

57 Garrabrant and Shah on human modeling in AGI

Rob Bensinger

1y

10

73 Why I'm excited about Debate

Richard_Ngo

1y

12

68 A guide to Iterated Amplification & Debate

Rafael Harth

2y

10

74 Model splintering: moving from one imperfect model to another

Stuart_Armstrong

2y

10