Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

2595 posts AI AI Timelines AI Takeoff Interpretability (ML & AI) Careers Instrumental Convergence Iterated Amplification Corrigibility Audio Debate (AI safety technique) Infra-Bayesianism DeepMind

488 posts GPT Conjecture (org) Art Music Machine Learning (ML) Bounties & Prizes (active) OpenAI QURI Language Models Project Announcement DALL-E Meta-Humor

84 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

41 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

5 Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic

Akash

2h

0

16 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

198 The next decades might be wild

Marius Hobbhahn

5d

21

265 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

140 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

6 I believe some AI doomers are overconfident

FTPickle

6h

4

5 Career Scouting: Housing Coordination

koratkar

5h

0

13 Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.

Charlie Steiner

19h

0

6 (Extremely) Naive Gradient Hacking Doesn't Work

ojorgensen

9h

0

71 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

19 Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend)

Remmelt

1d

6

323 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

27 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

112 Bad at Arithmetic, Promising at Math

cohenmacaulay

2d

17

47 Next Level Seinfeld

Zvi

1d

6

32 Take 11: "Aligning language models" should be weirder.

Charlie Steiner

2d

0

314 Jailbreaking ChatGPT on Release Day

Zvi

18d

74

101 [Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey

7d

10

15 Does ChatGPT’s performance warrant working on a tutor for children? [It’s time to take it to the lab.]

Bill Benzon

1d

2

7 Will research in AI risk jinx it? Consequences of training AI on AI risk arguments

Yann Dubois

1d

6

52 Discovering Latent Knowledge in Language Models Without Supervision

Xodarap

6d

1

234 Conjecture: a retrospective after 8 months of work

Connor Leahy

27d

9

148 Did ChatGPT just gaslight me?

ThomasW

19d

45

808 Simulators

janus

3mo

103

262 Mysteries of mode collapse

janus

1mo

35

36 An exploration of GPT-2's embedding weights

Adam Scherlis

7d

2