Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

28 posts Debate (AI safety technique) Factored Cognition Ought Adversarial Collaboration

37 posts Iterated Amplification Humans Consulting HCH Delegation

114 Supervise Process, not Outcomes

stuhlmueller

8mo

8

93 Ought: why it matters and ways to help

paulfchristiano

3y

7

91 Writeup: Progress on AI Safety via Debate

Beth Barnes

2y

18

78 Imitative Generalisation (AKA 'Learning the Prior')

Beth Barnes

1y

14

70 A guide to Iterated Amplification & Debate

Rafael Harth

2y

10

66 Why I'm excited about Debate

Richard_Ngo

1y

12

63 A Library and Tutorial for Factored Cognition with Language Models

stuhlmueller

2mo

0

61 Rant on Problem Factorization for Alignment

johnswentworth

4mo

48

49 Ought will host a factored cognition “Lab Meeting”

jungofthewon

3mo

1

49 A Small Negative Result on Debate

Sam Bowman

8mo

11

49 Factored Cognition

stuhlmueller

4y

6

44 AI Safety Debate and Its Applications

VojtaKovarik

3y

5

39 Looking for adversarial collaborators to test our Debate protocol

Beth Barnes

2y

5

37 How should AI debate be judged?

abramdemski

2y

27

132 Paul's research agenda FAQ

zhukeepa

4y

73

114 My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda

Chi Nguyen

2y

21

111 Debate update: Obfuscated arguments problem

Beth Barnes

1y

21

60 Model splintering: moving from one imperfect model to another

Stuart_Armstrong

2y

10

58 Relaxed adversarial training for inner alignment

evhub

3y

28

52 Machine Learning Projects on IDA

Owain_Evans

3y

3

44 Iterated Distillation and Amplification

Ajeya Cotra

4y

13

44 Garrabrant and Shah on human modeling in AGI

Rob Bensinger

1y

10

43 Preface to the sequence on iterated amplification

paulfchristiano

4y

8

42 Notes on OpenAI’s alignment plan

Alex Flint

12d

5

36 HCH is not just Mechanical Turk

William_S

3y

6

36 Can HCH epistemically dominate Ramanujan?

zhukeepa

3y

4

36 Directions and desiderata for AI alignment

paulfchristiano

3y

1

34 HCH Speculation Post #2A

Charlie Steiner

1y

7