Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
2595 posts
AI
AI Timelines
AI Takeoff
Interpretability (ML & AI)
Careers
Instrumental Convergence
Iterated Amplification
Corrigibility
Audio
Debate (AI safety technique)
Infra-Bayesianism
DeepMind
488 posts
GPT
Conjecture (org)
Art
Music
Machine Learning (ML)
Bounties & Prizes (active)
OpenAI
QURI
Language Models
Project Announcement
DALL-E
Meta-Humor
84
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
16
An Open Agency Architecture for Safe Transformative AI
davidad
11h
11
198
The next decades might be wild
Marius Hobbhahn
5d
21
6
I believe some AI doomers are overconfident
FTPickle
6h
4
41
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
52
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
26
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Charlie Steiner
8d
14
11
Will Machines Ever Rule the World? MLAISU W50
Esben Kran
4d
4
140
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
89
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
282
AGI Safety FAQ / all-dumb-questions-allowed thread
Aryeh Englander
6mo
514
19
Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend)
Remmelt
1d
6
190
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
25
If Wentworth is right about natural abstractions, it would be bad for alignment
Wuschel Schulz
12d
5
27
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
4h
3
37
Reframing inner alignment
davidad
9d
13
7
Will research in AI risk jinx it? Consequences of training AI on AI risk arguments
Yann Dubois
1d
6
112
Bad at Arithmetic, Promising at Math
cohenmacaulay
2d
17
47
Next Level Seinfeld
Zvi
1d
6
314
Jailbreaking ChatGPT on Release Day
Zvi
18d
74
148
Did ChatGPT just gaslight me?
ThomasW
19d
45
101
[Interim research report] Taking features out of superposition with sparse autoencoders
Lee Sharkey
7d
10
19
A crisis for online communication: bots and bot users will overrun the Internet?
Mitchell_Porter
9d
11
15
Does ChatGPT’s performance warrant working on a tutor for children? [It’s time to take it to the lab.]
Bill Benzon
1d
2
262
Mysteries of mode collapse
janus
1mo
35
15
[LINK] - ChatGPT discussion
JanBrauner
19d
7
-1
Could an AI be Religious?
mk54
16d
14
26
Is the ChatGPT-simulated Linux virtual machine real?
Kenoubi
7d
7