Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

28 posts Debate (AI safety technique) Factored Cognition Ought Adversarial Collaboration

37 posts Iterated Amplification Humans Consulting HCH Delegation

36 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

73 Rant on Problem Factorization for Alignment

johnswentworth

4mo

48

42 A Small Negative Result on Debate

Sam Bowman

8mo

11

118 Supervise Process, not Outcomes

stuhlmueller

8mo

8

35 Ought will host a factored cognition “Lab Meeting”

jungofthewon

3mo

1

36 AI Safety Debate and Its Applications

VojtaKovarik

3y

5

94 Writeup: Progress on AI Safety via Debate

Beth Barnes

2y

18

45 Factored Cognition

stuhlmueller

4y

6

16 Traversing a Cognition Space

Rafael Harth

2y

5

48 Vaniver's View on Factored Cognition

Vaniver

3y

4

92 Imitative Generalisation (AKA 'Learning the Prior')

Beth Barnes

1y

14

73 Why I'm excited about Debate

Richard_Ngo

1y

12

32 New paper: (When) is Truth-telling Favored in AI debate?

VojtaKovarik

2y

7

34 Idealized Factored Cognition

Rafael Harth

2y

6

47 Notes on OpenAI’s alignment plan

Alex Flint

12d

5

61 Relaxed adversarial training for inner alignment

evhub

3y

28

125 Paul's research agenda FAQ

zhukeepa

4y

73

20 Meta-execution

paulfchristiano

4y

1

41 HCH is not just Mechanical Turk

William_S

3y

6

34 Can HCH epistemically dominate Ramanujan?

zhukeepa

3y

4

7 Predicting HCH using expert advice

jessicata

6y

0

1 HCH as a measure of manipulation

orthonormal

5y

0

24 Reliability amplification

paulfchristiano

3y

3

21 Approval-directed bootstrapping

paulfchristiano

4y

0

30 Approval-directed agents

paulfchristiano

4y

11

42 Preface to the sequence on iterated amplification

paulfchristiano

4y

8

15 Mapping the Conceptual Territory in AI Existential Safety and Alignment

jbkjr

1y

0

45 Iterated Distillation and Amplification

Ajeya Cotra

4y

13