Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

2 posts Distributional Shifts

26 posts SERI MATS

17 Distribution Shifts and The Importance of AI Safety

Leon Lang

2mo

2

6 Mesa-optimization for goals defined only within a training environment is dangerous

Rubi J. Hudson

4mo

2

55 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

15 Is the "Valley of Confused Abstractions" real?

jacquesthibs

15d

9

71 SERI MATS Program - Winter 2022 Cohort

Ryan Kidd

2mo

12

117 Taking the parameters which seem to matter and rotating them until they don't

Garrett Baker

3mo

48

68 Neural Tangent Kernel Distillation

Thomas Larsen

2mo

20

26 The Ground Truth Problem (Or, Why Evaluating Interpretability Methods Is Hard)

Jessica Mary

1mo

2

7 Working towards AI alignment is better

Johannes C. Mayer

11d

2

28 Why I'm Working On Model Agnostic Interpretability

Jessica Mary

1mo

9

15 Guardian AI (Misaligned systems are all around us.)

Jessica Mary

25d

6

103 Externalized reasoning oversight: a research direction for language model alignment

tamera

4mo

22

28 Auditing games for high-level interpretability

Paul Colognese

1mo

1

37 Framing AI Childhoods

David Udell

3mo

8

35 Behaviour Manifolds and the Hessian of the Total Loss - Notes and Criticism

Spencer Becker-Kahn

3mo

4

14 What sorts of systems can be deceptive?

Andrei Alexandru

1mo

0