Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

69 posts Debate (AI safety technique) Factored Cognition Experiments Ought AI-assisted Alignment Memory and Mnemonics Air Conditioning

43 posts Iterated Amplification Humans Consulting HCH

36 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

151 Godzilla Strategies

johnswentworth

6mo

65

7 Alignment with argument-networks and assessment-predictions

Tor Økland Barstad

7d

3

16 Research request (alignment strategy): Deep dive on "making AI solve alignment for us"

JanBrauner

19d

3

73 Rant on Problem Factorization for Alignment

johnswentworth

4mo

48

10 Provably Honest - A First Step

Srijanak De

1mo

2

27 AI Safety via Debate

ESRogs

4y

13

9 Getting from an unaligned AGI to an aligned AGI?

Tor Økland Barstad

6mo

7

80 Air Conditioner Test Results & Discussion

johnswentworth

6mo

38

8 AI-assisted list of ten concrete alignment things to do right now

lcmgcd

3mo

5

118 Supervise Process, not Outcomes

stuhlmueller

8mo

8

92 Beliefs and Disagreements about Automating Alignment Research

Ian McKenzie

3mo

4

20 Briefly thinking through some analogs of debate

Eli Tyre

3mo

3

14 Making it harder for an AGI to "trick" us, with STVs

Tor Økland Barstad

5mo

5

47 Notes on OpenAI’s alignment plan

Alex Flint

12d

5

61 Relaxed adversarial training for inner alignment

evhub

3y

28

125 Debate update: Obfuscated arguments problem

Beth Barnes

1y

21

125 Paul's research agenda FAQ

zhukeepa

4y

73

20 Meta-execution

paulfchristiano

4y

1

41 HCH is not just Mechanical Turk

William_S

3y

6

26 The reward engineering problem

paulfchristiano

3y

3

34 Can HCH epistemically dominate Ramanujan?

zhukeepa

3y

4

7 Predicting HCH using expert advice

jessicata

6y

0

1 HCH as a measure of manipulation

orthonormal

5y

0

24 Reliability amplification

paulfchristiano

3y

3

21 Approval-directed bootstrapping

paulfchristiano

4y

0

30 Approval-directed agents

paulfchristiano

4y

11

16 Epistemology of HCH

adamShimi

1y

2