Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
2 posts
Distributional Shifts
26 posts
SERI MATS
27
Distribution Shifts and The Importance of AI Safety
Leon Lang
2mo
2
13
Mesa-optimization for goals defined only within a training environment is dangerous
Rubi J. Hudson
4mo
2
71
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
13
Working towards AI alignment is better
Johannes C. Mayer
11d
2
26
Guardian AI (Misaligned systems are all around us.)
Jessica Mary
25d
6
14
Is the "Valley of Confused Abstractions" real?
jacquesthibs
15d
9
138
Taking the parameters which seem to matter and rotating them until they don't
Garrett Baker
3mo
48
81
SERI MATS Program - Winter 2022 Cohort
Ryan Kidd
2mo
12
82
Neural Tangent Kernel Distillation
Thomas Larsen
2mo
20
36
Why I'm Working On Model Agnostic Interpretability
Jessica Mary
1mo
9
43
Auditing games for high-level interpretability
Paul Colognese
1mo
1
134
Externalized reasoning oversight: a research direction for language model alignment
tamera
4mo
22
24
The Ground Truth Problem (Or, Why Evaluating Interpretability Methods Is Hard)
Jessica Mary
1mo
2
20
What sorts of systems can be deceptive?
Andrei Alexandru
1mo
0
44
Framing AI Childhoods
David Udell
3mo
8
42
Behaviour Manifolds and the Hessian of the Total Loss - Notes and Criticism
Spencer Becker-Kahn
3mo
4