Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

47 posts Interpretability (ML & AI) Empiricism

5 posts AI Success Models Conservatism (AI) Principal-Agent Problems Market making (AI safety technique)

123 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

211 The Plan - 2022 Update

johnswentworth

19d

33

26 Paper: Transformers learn in-context by gradient descent

LawrenceC

4d

11

99 Re-Examining LayerNorm

Eric Winsor

19d

8

22 Extracting and Evaluating Causal Direction in LLMs' Activations

Fabien Roger

6d

2

31 [ASoT] Natural abstractions and AlphaZero

Ulisse Mini

10d

1

57 Multi-Component Learning and S-Curves

Adam Jermyn

20d

24

72 Engineering Monosemanticity in Toy Models

Adam Jermyn

1mo

6

338 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

24 Subsets and quotients in interpretability

Erik Jenner

18d

1

68 Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?

Neel Nanda

1mo

14

62 A Barebones Guide to Mechanistic Interpretability Prerequisites

Neel Nanda

1mo

8

66 An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers

Neel Nanda

2mo

5

78 Polysemanticity and Capacity in Neural Networks

Buck

2mo

9

13 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

45 Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios

Evan R. Murphy

7mo

0

78 A positive case for how we might succeed at prosaic AI alignment

evhub

1y

47

60 Solving the whole AGI control problem, version 0.0001

Steven Byrnes

1y

7

31 Pessimism About Unknown Unknowns Inspires Conservatism

michaelcohen

2y

2