Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

2595 posts AI AI Timelines AI Takeoff Interpretability (ML & AI) Careers Instrumental Convergence Iterated Amplification Corrigibility Audio Debate (AI safety technique) Infra-Bayesianism DeepMind

488 posts GPT Conjecture (org) Art Music Machine Learning (ML) Bounties & Prizes (active) OpenAI QURI Language Models Project Announcement DALL-E Meta-Humor

84 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

16 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

198 The next decades might be wild

Marius Hobbhahn

5d

21

6 I believe some AI doomers are overconfident

FTPickle

6h

4

41 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

52 Existential AI Safety is NOT separate from near-term applications

scasper

7d

15

26 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

11 Will Machines Ever Rule the World? MLAISU W50

Esben Kran

4d

4

140 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

89 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

282 AGI Safety FAQ / all-dumb-questions-allowed thread

Aryeh Englander

6mo

514

19 Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend)

Remmelt

1d

6

190 Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong

14d

77

25 If Wentworth is right about natural abstractions, it would be bad for alignment

Wuschel Schulz

12d

5

27 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

37 Reframing inner alignment

davidad

9d

13

7 Will research in AI risk jinx it? Consequences of training AI on AI risk arguments

Yann Dubois

1d

6

112 Bad at Arithmetic, Promising at Math

cohenmacaulay

2d

17

47 Next Level Seinfeld

Zvi

1d

6

314 Jailbreaking ChatGPT on Release Day

Zvi

18d

74

148 Did ChatGPT just gaslight me?

ThomasW

19d

45

101 [Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey

7d

10

19 A crisis for online communication: bots and bot users will overrun the Internet?

Mitchell_Porter

9d

11

15 Does ChatGPT’s performance warrant working on a tutor for children? [It’s time to take it to the lab.]

Bill Benzon

1d

2

262 Mysteries of mode collapse

janus

1mo

35

15 [LINK] - ChatGPT discussion

JanBrauner

19d

7

-1 Could an AI be Religious?

mk54

16d

14

26 Is the ChatGPT-simulated Linux virtual machine real?

Kenoubi

7d

7