Go Back
You can't go any further
You can't go any further
meritocratic
regular
democratic
hot
top
alive
2 posts
Distributional Shifts
26 posts
SERI MATS
17
Distribution Shifts and The Importance of AI Safety
Leon Lang
2mo
2
6
Mesa-optimization for goals defined only within a training environment is dangerous
Rubi J. Hudson
4mo
2
55
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
15
Is the "Valley of Confused Abstractions" real?
jacquesthibs
15d
9
71
SERI MATS Program - Winter 2022 Cohort
Ryan Kidd
2mo
12
117
Taking the parameters which seem to matter and rotating them until they don't
Garrett Baker
3mo
48
68
Neural Tangent Kernel Distillation
Thomas Larsen
2mo
20
26
The Ground Truth Problem (Or, Why Evaluating Interpretability Methods Is Hard)
Jessica Mary
1mo
2
7
Working towards AI alignment is better
Johannes C. Mayer
11d
2
28
Why I'm Working On Model Agnostic Interpretability
Jessica Mary
1mo
9
15
Guardian AI (Misaligned systems are all around us.)
Jessica Mary
25d
6
103
Externalized reasoning oversight: a research direction for language model alignment
tamera
4mo
22
28
Auditing games for high-level interpretability
Paul Colognese
1mo
1
37
Framing AI Childhoods
David Udell
3mo
8
35
Behaviour Manifolds and the Hessian of the Total Loss - Notes and Criticism
Spencer Becker-Kahn
3mo
4
14
What sorts of systems can be deceptive?
Andrei Alexandru
1mo
0