Tree of Tags

Go Back

Choose this branch

You can't go any further

meritocratic regular democratic

hot top alive

46 posts Factored Cognition Experiments Ought AI-assisted Alignment Memory and Mnemonics Air Conditioning

23 posts Debate (AI safety technique)

151 Godzilla Strategies

johnswentworth

6mo

65

7 Alignment with argument-networks and assessment-predictions

Tor Økland Barstad

7d

3

16 Research request (alignment strategy): Deep dive on "making AI solve alignment for us"

JanBrauner

19d

3

73 Rant on Problem Factorization for Alignment

johnswentworth

4mo

48

10 Provably Honest - A First Step

Srijanak De

1mo

2

9 Getting from an unaligned AGI to an aligned AGI?

Tor Økland Barstad

6mo

7

80 Air Conditioner Test Results & Discussion

johnswentworth

6mo

38

8 AI-assisted list of ten concrete alignment things to do right now

lcmgcd

3mo

5

118 Supervise Process, not Outcomes

stuhlmueller

8mo

8

92 Beliefs and Disagreements about Automating Alignment Research

Ian McKenzie

3mo

4

14 Making it harder for an AGI to "trick" us, with STVs

Tor Økland Barstad

5mo

5

3 A Deceptively Simple Argument in favor of Problem Factorization

Logan Zoellner

4mo

4

16 Discussion on utilizing AI for alignment

elifland

3mo

3

8 Sufficiently many Godzillas as an alignment strategy

142857

3mo

3

36 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

27 AI Safety via Debate

ESRogs

4y

13

20 Briefly thinking through some analogs of debate

Eli Tyre

3mo

3

42 A Small Negative Result on Debate

Sam Bowman

8mo

11

94 Writeup: Progress on AI Safety via Debate

Beth Barnes

2y

18

68 A guide to Iterated Amplification & Debate

Rafael Harth

2y

10

36 AI Safety Debate and Its Applications

VojtaKovarik

3y

5

9 Debate AI and the Decision to Release an AI

Chris_Leong

3y

18

13 Splitting Debate up into Two Subsystems

Nandi

2y

5

31 Learning the smooth prior

Geoffrey Irving

7mo

0

92 Imitative Generalisation (AKA 'Learning the Prior')

Beth Barnes

1y

14

73 Why I'm excited about Debate

Richard_Ngo

1y

12

32 New paper: (When) is Truth-telling Favored in AI debate?

VojtaKovarik

2y

7

12 Thoughts on "AI safety via debate"

Gordon Seidoh Worley

4y

4