Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

28 posts Debate (AI safety technique) Factored Cognition Ought Adversarial Collaboration

37 posts Iterated Amplification Humans Consulting HCH Delegation

36 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

47 A Library and Tutorial for Factored Cognition with Language Models

stuhlmueller

2mo

0

73 Rant on Problem Factorization for Alignment

johnswentworth

4mo

48

118 Supervise Process, not Outcomes

stuhlmueller

8mo

8

35 Ought will host a factored cognition “Lab Meeting”

jungofthewon

3mo

1

42 A Small Negative Result on Debate

Sam Bowman

8mo

11

92 Imitative Generalisation (AKA 'Learning the Prior')

Beth Barnes

1y

14

73 Why I'm excited about Debate

Richard_Ngo

1y

12

68 A guide to Iterated Amplification & Debate

Rafael Harth

2y

10

94 Writeup: Progress on AI Safety via Debate

Beth Barnes

2y

18

87 Ought: why it matters and ways to help

paulfchristiano

3y

7

52 Looking for adversarial collaborators to test our Debate protocol

Beth Barnes

2y

5

49 How should AI debate be judged?

abramdemski

2y

27

37 Debate Minus Factored Cognition

abramdemski

1y

42

47 Notes on OpenAI’s alignment plan

Alex Flint

12d

5

125 Debate update: Obfuscated arguments problem

Beth Barnes

1y

21

119 My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda

Chi Nguyen

2y

21

57 Garrabrant and Shah on human modeling in AGI

Rob Bensinger

1y

10

74 Model splintering: moving from one imperfect model to another

Stuart_Armstrong

2y

10

125 Paul's research agenda FAQ

zhukeepa

4y

73

42 HCH Speculation Post #2A

Charlie Steiner

1y

7

61 Relaxed adversarial training for inner alignment

evhub

3y

28

49 Machine Learning Projects on IDA

Owain_Evans

3y

3

28 Universality Unwrapped

adamShimi

2y

2

10 Universality and the “Filter”

maggiehayes

1y

3

33 Synthesizing amplification and debate

evhub

2y

10

47 Directions and desiderata for AI alignment

paulfchristiano

3y

1

45 Iterated Distillation and Amplification

Ajeya Cotra

4y

13