Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

51 posts Machine Learning (ML) DeepMind OpenAI Truth, Semantics, & Meaning Lottery Ticket Hypothesis Honesty Anthropic Map and Territory Calibration

52 posts Interpretability (ML & AI) AI Success Models Conservatism (AI) Principal-Agent Problems Market making (AI safety technique) Empiricism

265 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

47 Reframing inner alignment

davidad

9d

13

364 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

20 My thoughts on OpenAI's Alignment plan

Donald Hobson

10d

0

104 Caution when interpreting Deepmind's In-context RL paper

Sam Marks

1mo

6

102 Clarifying AI X-risk

zac_kenton

1mo

23

226 Common misconceptions about OpenAI

Jacob_Hilton

3mo

138

80 Paper: Discovering novel algorithms with AlphaTensor [Deepmind]

LawrenceC

2mo

18

89 Survey of NLP Researchers: NLP is contributing to AGI progress; major catastrophe plausible

Sam Bowman

3mo

6

64 Toy Models of Superposition

evhub

3mo

2

44 Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA

Marius Hobbhahn

2mo

11

28 Paper: In-context Reinforcement Learning with Algorithm Distillation [Deepmind]

LawrenceC

1mo

5

89 Safety Implications of LeCun's path to machine intelligence

Ivan Vendrov

5mo

16

125 A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]

Dan H

7mo

5

13 An Open Agency Architecture for Safe Transformative AI

davidad

11h

11

123 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

211 The Plan - 2022 Update

johnswentworth

19d

33

26 Paper: Transformers learn in-context by gradient descent

LawrenceC

4d

11

99 Re-Examining LayerNorm

Eric Winsor

19d

8

22 Extracting and Evaluating Causal Direction in LLMs' Activations

Fabien Roger

6d

2

31 [ASoT] Natural abstractions and AlphaZero

Ulisse Mini

10d

1

57 Multi-Component Learning and S-Curves

Adam Jermyn

20d

24

72 Engineering Monosemanticity in Toy Models

Adam Jermyn

1mo

6

338 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

24 Subsets and quotients in interpretability

Erik Jenner

18d

1

68 Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?

Neel Nanda

1mo

14

62 A Barebones Guide to Mechanistic Interpretability Prerequisites

Neel Nanda

1mo

8

66 An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers

Neel Nanda

2mo

5