Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

28 posts Debate (AI safety technique) Factored Cognition Ought Adversarial Collaboration

37 posts Iterated Amplification Humans Consulting HCH Delegation

47 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

85 Rant on Problem Factorization for Alignment

johnswentworth

4mo

48

122 Supervise Process, not Outcomes

stuhlmueller

8mo

8

31 A Library and Tutorial for Factored Cognition with Language Models

stuhlmueller

2mo

0

21 Ought will host a factored cognition “Lab Meeting”

jungofthewon

3mo

1

35 A Small Negative Result on Debate

Sam Bowman

8mo

11

106 Imitative Generalisation (AKA 'Learning the Prior')

Beth Barnes

1y

14

80 Why I'm excited about Debate

Richard_Ngo

1y

12

97 Writeup: Progress on AI Safety via Debate

Beth Barnes

2y

18

66 A guide to Iterated Amplification & Debate

Rafael Harth

2y

10

65 Looking for adversarial collaborators to test our Debate protocol

Beth Barnes

2y

5

61 How should AI debate be judged?

abramdemski

2y

27

47 Debate Minus Factored Cognition

abramdemski

1y

42

81 Ought: why it matters and ways to help

paulfchristiano

3y

7

52 Notes on OpenAI’s alignment plan

Alex Flint

12d

5

139 Debate update: Obfuscated arguments problem

Beth Barnes

1y

21

70 Garrabrant and Shah on human modeling in AGI

Rob Bensinger

1y

10

124 My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda

Chi Nguyen

2y

21

88 Model splintering: moving from one imperfect model to another

Stuart_Armstrong

2y

10

50 HCH Speculation Post #2A

Charlie Steiner

1y

7

118 Paul's research agenda FAQ

zhukeepa

4y

73

64 Relaxed adversarial training for inner alignment

evhub

3y

28

33 Universality Unwrapped

adamShimi

2y

2

58 Directions and desiderata for AI alignment

paulfchristiano

3y

1

38 Synthesizing amplification and debate

evhub

2y

10

46 Machine Learning Projects on IDA

Owain_Evans

3y

3

46 HCH is not just Mechanical Turk

William_S

3y

6

44 What's wrong with these analogies for understanding Informed Oversight and IDA?

Wei_Dai

3y

3