Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
51 posts
Machine Learning (ML)
DeepMind
OpenAI
Truth, Semantics, & Meaning
Lottery Ticket Hypothesis
Honesty
Anthropic
Map and Territory
Calibration
52 posts
Interpretability (ML & AI)
AI Success Models
Conservatism (AI)
Principal-Agent Problems
Market making (AI safety technique)
Empiricism
265
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
47
Reframing inner alignment
davidad
9d
13
364
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
20
My thoughts on OpenAI's Alignment plan
Donald Hobson
10d
0
104
Caution when interpreting Deepmind's In-context RL paper
Sam Marks
1mo
6
102
Clarifying AI X-risk
zac_kenton
1mo
23
226
Common misconceptions about OpenAI
Jacob_Hilton
3mo
138
80
Paper: Discovering novel algorithms with AlphaTensor [Deepmind]
LawrenceC
2mo
18
89
Survey of NLP Researchers: NLP is contributing to AGI progress; major catastrophe plausible
Sam Bowman
3mo
6
64
Toy Models of Superposition
evhub
3mo
2
44
Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA
Marius Hobbhahn
2mo
11
28
Paper: In-context Reinforcement Learning with Algorithm Distillation [Deepmind]
LawrenceC
1mo
5
89
Safety Implications of LeCun's path to machine intelligence
Ivan Vendrov
5mo
16
125
A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]
Dan H
7mo
5
13
An Open Agency Architecture for Safe Transformative AI
davidad
11h
11
123
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
211
The Plan - 2022 Update
johnswentworth
19d
33
26
Paper: Transformers learn in-context by gradient descent
LawrenceC
4d
11
99
Re-Examining LayerNorm
Eric Winsor
19d
8
22
Extracting and Evaluating Causal Direction in LLMs' Activations
Fabien Roger
6d
2
31
[ASoT] Natural abstractions and AlphaZero
Ulisse Mini
10d
1
57
Multi-Component Learning and S-Curves
Adam Jermyn
20d
24
72
Engineering Monosemanticity in Toy Models
Adam Jermyn
1mo
6
338
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
24
Subsets and quotients in interpretability
Erik Jenner
18d
1
68
Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?
Neel Nanda
1mo
14
62
A Barebones Guide to Mechanistic Interpretability Prerequisites
Neel Nanda
1mo
8
66
An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers
Neel Nanda
2mo
5