Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

26 posts Iterated Amplification

17 posts Humans Consulting HCH

47 Notes on OpenAI’s alignment plan

Alex Flint

12d

5

61 Relaxed adversarial training for inner alignment

evhub

3y

28

125 Debate update: Obfuscated arguments problem

Beth Barnes

1y

21

125 Paul's research agenda FAQ

zhukeepa

4y

73

26 The reward engineering problem

paulfchristiano

3y

3

24 Reliability amplification

paulfchristiano

3y

3

21 Approval-directed bootstrapping

paulfchristiano

4y

0

30 Approval-directed agents

paulfchristiano

4y

11

20 Explanation of Paul's AI-Alignment agenda by Ajeya Cotra

habryka

4y

0

42 Preface to the sequence on iterated amplification

paulfchristiano

4y

8

45 Iterated Distillation and Amplification

Ajeya Cotra

4y

13

17 Amplification Discussion Notes

William_S

4y

3

15 Benign model-free RL

paulfchristiano

4y

1

119 My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda

Chi Nguyen

2y

21

20 Meta-execution

paulfchristiano

4y

1

41 HCH is not just Mechanical Turk

William_S

3y

6

34 Can HCH epistemically dominate Ramanujan?

zhukeepa

3y

4

7 Predicting HCH using expert advice

jessicata

6y

0

1 HCH as a measure of manipulation

orthonormal

5y

0

16 Epistemology of HCH

adamShimi

1y

2

15 Mapping the Conceptual Territory in AI Existential Safety and Alignment

jbkjr

1y

0

35 What's wrong with these analogies for understanding Informed Oversight and IDA?

Wei_Dai

3y

3

27 Towards formalizing universality

paulfchristiano

3y

19

42 HCH Speculation Post #2A

Charlie Steiner

1y

7

32 Humans Consulting HCH

paulfchristiano

4y

10

47 Relating HCH and Logical Induction

abramdemski

2y

4

10 Universality and the “Filter”

maggiehayes

1y

3

57 Garrabrant and Shah on human modeling in AGI

Rob Bensinger

1y

10