Tree of Tags

Go Back

Choose this branch

You can't go any further

meritocratic regular democratic

hot top alive

46 posts Factored Cognition Experiments Ought AI-assisted Alignment Memory and Mnemonics Air Conditioning

23 posts Debate (AI safety technique)

184 Godzilla Strategies

johnswentworth

6mo

65

11 Alignment with argument-networks and assessment-predictions

Tor Økland Barstad

7d

3

21 Research request (alignment strategy): Deep dive on "making AI solve alignment for us"

JanBrauner

19d

3

62 Rant on Problem Factorization for Alignment

johnswentworth

4mo

48

17 Provably Honest - A First Step

Srijanak De

1mo

2

13 Getting from an unaligned AGI to an aligned AGI?

Tor Økland Barstad

6mo

7

80 Air Conditioner Test Results & Discussion

johnswentworth

6mo

38

12 AI-assisted list of ten concrete alignment things to do right now

lcmgcd

3mo

5

120 Supervise Process, not Outcomes

stuhlmueller

8mo

8

105 Beliefs and Disagreements about Automating Alignment Research

Ian McKenzie

3mo

4

28 Making it harder for an AGI to "trick" us, with STVs

Tor Økland Barstad

5mo

5

5 A Deceptively Simple Argument in favor of Problem Factorization

Logan Zoellner

4mo

4

21 Discussion on utilizing AI for alignment

elifland

3mo

3

11 Sufficiently many Godzillas as an alignment strategy

142857

3mo

3

26 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

20 AI Safety via Debate

ESRogs

4y

13

13 Briefly thinking through some analogs of debate

Eli Tyre

3mo

3

52 A Small Negative Result on Debate

Sam Bowman

8mo

11

97 Writeup: Progress on AI Safety via Debate

Beth Barnes

2y

18

74 A guide to Iterated Amplification & Debate

Rafael Harth

2y

10

46 AI Safety Debate and Its Applications

VojtaKovarik

3y

5

6 Debate AI and the Decision to Release an AI

Chris_Leong

3y

18

17 Splitting Debate up into Two Subsystems

Nandi

2y

5

32 Learning the smooth prior

Geoffrey Irving

7mo

0

82 Imitative Generalisation (AKA 'Learning the Prior')

Beth Barnes

1y

14

69 Why I'm excited about Debate

Richard_Ngo

1y

12

30 New paper: (When) is Truth-telling Favored in AI debate?

VojtaKovarik

2y

7

8 Thoughts on "AI safety via debate"

Gordon Seidoh Worley

4y

4