Tree of Tags

Go Back

You can't go any further

Choose this branch

meritocratic regular democratic

hot top alive

23 posts Iterated Amplification

14 posts Humans Consulting HCH Delegation

42 Notes on OpenAI’s alignment plan

Alex Flint

12d

5

111 Debate update: Obfuscated arguments problem

Beth Barnes

1y

21

114 My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda

Chi Nguyen

2y

21

132 Paul's research agenda FAQ

zhukeepa

4y

73

60 Model splintering: moving from one imperfect model to another

Stuart_Armstrong

2y

10

58 Relaxed adversarial training for inner alignment

evhub

3y

28

52 Machine Learning Projects on IDA

Owain_Evans

3y

3

44 Iterated Distillation and Amplification

Ajeya Cotra

4y

13

43 Preface to the sequence on iterated amplification

paulfchristiano

4y

8

28 Synthesizing amplification and debate

evhub

2y

10

36 Directions and desiderata for AI alignment

paulfchristiano

3y

1

28 Approval-directed agents

paulfchristiano

4y

11

25 Supervising strong learners by amplifying weak experts

paulfchristiano

3y

1

24 Reinforcement Learning in the Iterated Amplification Framework

William_S

3y

12

44 Garrabrant and Shah on human modeling in AGI

Rob Bensinger

1y

10

34 HCH Speculation Post #2A

Charlie Steiner

1y

7

16 Universality and the “Filter”

maggiehayes

1y

3

23 Universality Unwrapped

adamShimi

2y

2

36 Can HCH epistemically dominate Ramanujan?

zhukeepa

3y

4

36 HCH is not just Mechanical Turk

William_S

3y

6

14 Mapping the Conceptual Territory in AI Existential Safety and Alignment

jbkjr

1y

0

26 What are the differences between all the iterative/recursive approaches to AI alignment?

riceissa

3y

14

30 Humans Consulting HCH

paulfchristiano

4y

10

26 What's wrong with these analogies for understanding Informed Oversight and IDA?

Wei_Dai

3y

3

19 Towards formalizing universality

paulfchristiano

3y

19

12 Meta-execution

paulfchristiano

4y

1

8 Predicting HCH using expert advice

jessicata

6y

0

1 HCH as a measure of manipulation

orthonormal

5y

0