Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
1855 posts
AI
SERI MATS
AI Sentience
Distributional Shifts
AI Robustness
Truthful AI
Adversarial Examples
185 posts
Careers
Audio
Interviews
Infra-Bayesianism
Organization Updates
AXRP
Formal Proof
Redwood Research
Domain Theory
Adversarial Training
40
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
108
The next decades might be wild
Marius Hobbhahn
5d
21
0
I believe some AI doomers are overconfident
FTPickle
6h
4
33
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
22
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
13
Will Machines Ever Rule the World? MLAISU W50
Esben Kran
4d
4
95
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
160
AGI Safety FAQ / all-dumb-questions-allowed thread
Aryeh Englander
6mo
514
-3
Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend)
Remmelt
1d
6
128
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
29
If Wentworth is right about natural abstractions, it would be bad for alignment
Wuschel Schulz
12d
5
73
Revisiting algorithmic progress
Tamay
7d
6
44
Predicting GPU performance
Marius Hobbhahn
6d
24
31
Is the AI timeline too short to have children?
Yoreth
6d
20
5
What about non-degree seeking?
Lao Mein
3d
5
6
[ASoT] Reflectivity in Narrow AI
Ulisse Mini
29d
1
28
Where to be an AI Safety Professor
scasper
13d
12
39
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
16
Is the "Valley of Confused Abstractions" real?
jacquesthibs
15d
9
96
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
17
Latent Adversarial Training
Adam Jermyn
5mo
9
2
Vanessa Kosoy's PreDCA, distilled
Martín Soto
1mo
17
93
Infra-Bayesian physicalism: a formal theory of naturalized induction
Vanessa Kosoy
1y
20
96
Taking the parameters which seem to matter and rotating them until they don't
Garrett Baker
3mo
48
109
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
4
Guardian AI (Misaligned systems are all around us.)
Jessica Mary
25d
6
54
Neural Tangent Kernel Distillation
Thomas Larsen
2mo
20
20
Why I'm Working On Model Agnostic Interpretability
Jessica Mary
1mo
9