Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

69 posts Debate (AI safety technique) Factored Cognition Experiments Ought AI-assisted Alignment Memory and Mnemonics Air Conditioning

43 posts Iterated Amplification Humans Consulting HCH

46 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

118 Godzilla Strategies

johnswentworth

6mo

65

3 Alignment with argument-networks and assessment-predictions

Tor Økland Barstad

7d

3

11 Research request (alignment strategy): Deep dive on "making AI solve alignment for us"

JanBrauner

19d

3

84 Rant on Problem Factorization for Alignment

johnswentworth

4mo

48

3 Provably Honest - A First Step

Srijanak De

1mo

2

34 AI Safety via Debate

ESRogs

4y

13

5 Getting from an unaligned AGI to an aligned AGI?

Tor Økland Barstad

6mo

7

80 Air Conditioner Test Results & Discussion

johnswentworth

6mo

38

4 AI-assisted list of ten concrete alignment things to do right now

lcmgcd

3mo

5

116 Supervise Process, not Outcomes

stuhlmueller

8mo

8

79 Beliefs and Disagreements about Automating Alignment Research

Ian McKenzie

3mo

4

27 Briefly thinking through some analogs of debate

Eli Tyre

3mo

3

0 Making it harder for an AGI to "trick" us, with STVs

Tor Økland Barstad

5mo

5

50 Notes on OpenAI’s alignment plan

Alex Flint

12d

5

61 Relaxed adversarial training for inner alignment

evhub

3y

28

132 Debate update: Obfuscated arguments problem

Beth Barnes

1y

21

111 Paul's research agenda FAQ

zhukeepa

4y

73

27 Meta-execution

paulfchristiano

4y

1

45 HCH is not just Mechanical Turk

William_S

3y

6

28 The reward engineering problem

paulfchristiano

3y

3

31 Can HCH epistemically dominate Ramanujan?

zhukeepa

3y

4

6 Predicting HCH using expert advice

jessicata

6y

0

1 HCH as a measure of manipulation

orthonormal

5y

0

27 Reliability amplification

paulfchristiano

3y

3

28 Approval-directed bootstrapping

paulfchristiano

4y

0

30 Approval-directed agents

paulfchristiano

4y

11

20 Epistemology of HCH

adamShimi

1y

2