Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

69 posts Debate (AI safety technique) Factored Cognition Experiments Ought AI-assisted Alignment Memory and Mnemonics Air Conditioning

43 posts Iterated Amplification Humans Consulting HCH

46 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

11 Research request (alignment strategy): Deep dive on "making AI solve alignment for us"

JanBrauner

19d

3

79 Beliefs and Disagreements about Automating Alignment Research

Ian McKenzie

3mo

4

84 Rant on Problem Factorization for Alignment

johnswentworth

4mo

48

118 Godzilla Strategies

johnswentworth

6mo

65

3 Alignment with argument-networks and assessment-predictions

Tor Økland Barstad

7d

3

80 Air Conditioner Test Results & Discussion

johnswentworth

6mo

38

116 Supervise Process, not Outcomes

stuhlmueller

8mo

8

106 Preregistration: Air Conditioner Test

johnswentworth

8mo

64

28 A Library and Tutorial for Factored Cognition with Language Models

stuhlmueller

2mo

0

27 Briefly thinking through some analogs of debate

Eli Tyre

3mo

3

18 Ought will host a factored cognition “Lab Meeting”

jungofthewon

3mo

1

8 Infinite Possibility Space and the Shutdown Problem

magfrump

2mo

0

45 Scientific Wrestling: Beyond Passive Hypothesis-Testing

adamShimi

9mo

6

50 Notes on OpenAI’s alignment plan

Alex Flint

12d

5

132 Debate update: Obfuscated arguments problem

Beth Barnes

1y

21

68 Garrabrant and Shah on human modeling in AGI

Rob Bensinger

1y

10

117 My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda

Chi Nguyen

2y

21

11 Surprised by ELK report's counterexample to Debate, IDA

Evan R. Murphy

4mo

0

85 Model splintering: moving from one imperfect model to another

Stuart_Armstrong

2y

10

48 HCH Speculation Post #2A

Charlie Steiner

1y

7

60 Relating HCH and Logical Induction

abramdemski

2y

4

111 Paul's research agenda FAQ

zhukeepa

4y

73

61 Relaxed adversarial training for inner alignment

evhub

3y

28

12 HCH and Adversarial Questions

David Udell

10mo

7

32 Universality Unwrapped

adamShimi

2y

2

56 Directions and desiderata for AI alignment

paulfchristiano

3y

1

37 Synthesizing amplification and debate

evhub

2y

10