Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

2 posts Distributional Shifts

26 posts SERI MATS

27 Distribution Shifts and The Importance of AI Safety

Leon Lang

2mo

2

13 Mesa-optimization for goals defined only within a training environment is dangerous

Rubi J. Hudson

4mo

2

71 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

13 Working towards AI alignment is better

Johannes C. Mayer

11d

2

26 Guardian AI (Misaligned systems are all around us.)

Jessica Mary

25d

6

14 Is the "Valley of Confused Abstractions" real?

jacquesthibs

15d

9

138 Taking the parameters which seem to matter and rotating them until they don't

Garrett Baker

3mo

48

81 SERI MATS Program - Winter 2022 Cohort

Ryan Kidd

2mo

12

82 Neural Tangent Kernel Distillation

Thomas Larsen

2mo

20

36 Why I'm Working On Model Agnostic Interpretability

Jessica Mary

1mo

9

43 Auditing games for high-level interpretability

Paul Colognese

1mo

1

134 Externalized reasoning oversight: a research direction for language model alignment

tamera

4mo

22

24 The Ground Truth Problem (Or, Why Evaluating Interpretability Methods Is Hard)

Jessica Mary

1mo

2

20 What sorts of systems can be deceptive?

Andrei Alexandru

1mo

0

44 Framing AI Childhoods

David Udell

3mo

8

42 Behaviour Manifolds and the Hessian of the Total Loss - Notes and Criticism

Spencer Becker-Kahn

3mo

4