Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

2595 posts AI AI Timelines AI Takeoff Interpretability (ML & AI) Careers Instrumental Convergence Iterated Amplification Corrigibility Audio Debate (AI safety technique) Infra-Bayesianism DeepMind

488 posts GPT Conjecture (org) Art Music Machine Learning (ML) Bounties & Prizes (active) OpenAI QURI Language Models Project Announcement DALL-E Meta-Humor

62 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

13 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

153 The next decades might be wild

Marius Hobbhahn

5d

21

3 I believe some AI doomers are overconfident

FTPickle

6h

4

37 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

37 Existential AI Safety is NOT separate from near-term applications

scasper

7d

15

36 Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.

Charlie Steiner

8d

14

12 Will Machines Ever Rule the World? MLAISU W50

Esben Kran

4d

4

123 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

92 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

221 AGI Safety FAQ / all-dumb-questions-allowed thread

Aryeh Englander

6mo

514

8 Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend)

Remmelt

1d

6

159 Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong

14d

77

27 If Wentworth is right about natural abstractions, it would be bad for alignment

Wuschel Schulz

12d

5

27 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

47 Reframing inner alignment

davidad

9d

13

5 Will research in AI risk jinx it? Consequences of training AI on AI risk arguments

Yann Dubois

1d

6

91 Bad at Arithmetic, Promising at Math

cohenmacaulay

2d

17

45 Next Level Seinfeld

Zvi

1d

6

237 Jailbreaking ChatGPT on Release Day

Zvi

18d

74

123 Did ChatGPT just gaslight me?

ThomasW

19d

45

80 [Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey

7d

10

23 A crisis for online communication: bots and bot users will overrun the Internet?

Mitchell_Porter

9d

11

13 Does ChatGPT’s performance warrant working on a tutor for children? [It’s time to take it to the lab.]

Bill Benzon

1d

2

213 Mysteries of mode collapse

janus

1mo

35

13 [LINK] - ChatGPT discussion

JanBrauner

19d

7

-12 Could an AI be Religious?

mk54

16d

14

18 Is the ChatGPT-simulated Linux virtual machine real?

Kenoubi

7d

7