Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

69 posts Debate (AI safety technique) Factored Cognition Experiments Ought AI-assisted Alignment Memory and Mnemonics Air Conditioning

43 posts Iterated Amplification Humans Consulting HCH

26 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

184 Godzilla Strategies

johnswentworth

6mo

65

11 Alignment with argument-networks and assessment-predictions

Tor Økland Barstad

7d

3

21 Research request (alignment strategy): Deep dive on "making AI solve alignment for us"

JanBrauner

19d

3

62 Rant on Problem Factorization for Alignment

johnswentworth

4mo

48

17 Provably Honest - A First Step

Srijanak De

1mo

2

20 AI Safety via Debate

ESRogs

4y

13

13 Getting from an unaligned AGI to an aligned AGI?

Tor Økland Barstad

6mo

7

80 Air Conditioner Test Results & Discussion

johnswentworth

6mo

38

12 AI-assisted list of ten concrete alignment things to do right now

lcmgcd

3mo

5

120 Supervise Process, not Outcomes

stuhlmueller

8mo

8

105 Beliefs and Disagreements about Automating Alignment Research

Ian McKenzie

3mo

4

13 Briefly thinking through some analogs of debate

Eli Tyre

3mo

3

28 Making it harder for an AGI to "trick" us, with STVs

Tor Økland Barstad

5mo

5

44 Notes on OpenAI’s alignment plan

Alex Flint

12d

5

61 Relaxed adversarial training for inner alignment

evhub

3y

28

118 Debate update: Obfuscated arguments problem

Beth Barnes

1y

21

139 Paul's research agenda FAQ

zhukeepa

4y

73

13 Meta-execution

paulfchristiano

4y

1

37 HCH is not just Mechanical Turk

William_S

3y

6

24 The reward engineering problem

paulfchristiano

3y

3

37 Can HCH epistemically dominate Ramanujan?

zhukeepa

3y

4

8 Predicting HCH using expert advice

jessicata

6y

0

1 HCH as a measure of manipulation

orthonormal

5y

0

21 Reliability amplification

paulfchristiano

3y

3

14 Approval-directed bootstrapping

paulfchristiano

4y

0

30 Approval-directed agents

paulfchristiano

4y

11

12 Epistemology of HCH

adamShimi

1y

2