Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
51 posts
Machine Learning (ML)
DeepMind
OpenAI
Truth, Semantics, & Meaning
Lottery Ticket Hypothesis
Honesty
Anthropic
Map and Territory
Calibration
52 posts
Interpretability (ML & AI)
AI Success Models
Conservatism (AI)
Principal-Agent Problems
Market making (AI safety technique)
Empiricism
35
Reframing inner alignment
davidad
9d
13
307
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
253
Common misconceptions about OpenAI
Jacob_Hilton
3mo
138
140
Clarifying AI X-risk
zac_kenton
1mo
23
74
Paper: Discovering novel algorithms with AlphaTensor [Deepmind]
LawrenceC
2mo
18
55
A Data limited future
Donald Hobson
4mo
25
410
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
47
Steganography in Chain of Thought Reasoning
A Ray
4mo
13
104
Caution when interpreting Deepmind's In-context RL paper
Sam Marks
1mo
6
37
Paper: In-context Reinforcement Learning with Algorithm Distillation [Deepmind]
LawrenceC
1mo
5
34
Prosaic AI alignment
paulfchristiano
4y
10
97
Survey of NLP Researchers: NLP is contributing to AGI progress; major catastrophe plausible
Sam Bowman
3mo
6
24
Train first VS prune first in neural networks.
Donald Hobson
5mo
5
49
Autonomy as taking responsibility for reference maintenance
Ramana Kumar
4mo
3
15
An Open Agency Architecture for Safe Transformative AI
davidad
11h
11
132
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
20
Paper: Transformers learn in-context by gradient descent
LawrenceC
4d
11
239
The Plan - 2022 Update
johnswentworth
19d
33
53
Multi-Component Learning and S-Curves
Adam Jermyn
20d
24
43
"Cars and Elephants": a handwavy argument/analogy against mechanistic interpretability
David Scott Krueger (formerly: capybaralet)
1mo
25
33
Extracting and Evaluating Causal Direction in LLMs' Activations
Fabien Roger
6d
2
30
A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)
Neel Nanda
1mo
15
81
Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?
Neel Nanda
1mo
14
29
Toy Models and Tegum Products
Adam Jermyn
1mo
7
83
Polysemanticity and Capacity in Neural Networks
Buck
2mo
9
75
Engineering Monosemanticity in Toy Models
Adam Jermyn
1mo
6
29
Subsets and quotients in interpretability
Erik Jenner
18d
1
422
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39