Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
1855 posts
AI
SERI MATS
AI Sentience
Distributional Shifts
AI Robustness
Truthful AI
Adversarial Examples
185 posts
Careers
Audio
Interviews
Infra-Bayesianism
Organization Updates
AXRP
Formal Proof
Redwood Research
Domain Theory
Adversarial Training
84
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
198
The next decades might be wild
Marius Hobbhahn
5d
21
6
I believe some AI doomers are overconfident
FTPickle
6h
4
41
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
52
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
11
Will Machines Ever Rule the World? MLAISU W50
Esben Kran
4d
4
89
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
282
AGI Safety FAQ / all-dumb-questions-allowed thread
Aryeh Englander
6mo
514
19
Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend)
Remmelt
1d
6
190
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
25
If Wentworth is right about natural abstractions, it would be bad for alignment
Wuschel Schulz
12d
5
111
Revisiting algorithmic progress
Tamay
7d
6
74
Predicting GPU performance
Marius Hobbhahn
6d
24
35
Is the AI timeline too short to have children?
Yoreth
6d
20
5
What about non-degree seeking?
Lao Mein
3d
5
6
[ASoT] Reflectivity in Narrow AI
Ulisse Mini
29d
1
32
Where to be an AI Safety Professor
scasper
13d
12
71
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
14
Is the "Valley of Confused Abstractions" real?
jacquesthibs
15d
9
164
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
31
Latent Adversarial Training
Adam Jermyn
5mo
9
30
Vanessa Kosoy's PreDCA, distilled
Martín Soto
1mo
17
103
Infra-Bayesian physicalism: a formal theory of naturalized induction
Vanessa Kosoy
1y
20
138
Taking the parameters which seem to matter and rotating them until they don't
Garrett Baker
3mo
48
159
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
26
Guardian AI (Misaligned systems are all around us.)
Jessica Mary
25d
6
82
Neural Tangent Kernel Distillation
Thomas Larsen
2mo
20
36
Why I'm Working On Model Agnostic Interpretability
Jessica Mary
1mo
9