Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

28 posts Debate (AI safety technique) Factored Cognition Ought Adversarial Collaboration

37 posts Iterated Amplification Humans Consulting HCH Delegation

122 Supervise Process, not Outcomes

stuhlmueller

8mo

8

106 Imitative Generalisation (AKA 'Learning the Prior')

Beth Barnes

1y

14

97 Writeup: Progress on AI Safety via Debate

Beth Barnes

2y

18

85 Rant on Problem Factorization for Alignment

johnswentworth

4mo

48

81 Ought: why it matters and ways to help

paulfchristiano

3y

7

80 Why I'm excited about Debate

Richard_Ngo

1y

12

66 A guide to Iterated Amplification & Debate

Rafael Harth

2y

10

65 Vaniver's View on Factored Cognition

Vaniver

3y

4

65 Looking for adversarial collaborators to test our Debate protocol

Beth Barnes

2y

5

61 How should AI debate be judged?

abramdemski

2y

27

47 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

47 Debate Minus Factored Cognition

abramdemski

1y

42

45 Preface to the Sequence on Factored Cognition

Rafael Harth

2y

7

42 Idealized Factored Cognition

Rafael Harth

2y

6

139 Debate update: Obfuscated arguments problem

Beth Barnes

1y

21

124 My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda

Chi Nguyen

2y

21

118 Paul's research agenda FAQ

zhukeepa

4y

73

88 Model splintering: moving from one imperfect model to another

Stuart_Armstrong

2y

10

70 Garrabrant and Shah on human modeling in AGI

Rob Bensinger

1y

10

64 Relaxed adversarial training for inner alignment

evhub

3y

28

58 Directions and desiderata for AI alignment

paulfchristiano

3y

1

52 Notes on OpenAI’s alignment plan

Alex Flint

12d

5

50 HCH Speculation Post #2A

Charlie Steiner

1y

7

46 HCH is not just Mechanical Turk

William_S

3y

6

46 Iterated Distillation and Amplification

Ajeya Cotra

4y

13

46 Machine Learning Projects on IDA

Owain_Evans

3y

3

45 Understanding Iterated Distillation and Amplification: Claims and Oversight

William_S

4y

30

44 What's wrong with these analogies for understanding Informed Oversight and IDA?

Wei_Dai

3y

3