Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
47 posts
Interpretability (ML & AI)
Empiricism
5 posts
AI Success Models
Conservatism (AI)
Principal-Agent Problems
Market making (AI safety technique)
254
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
183
The Plan - 2022 Update
johnswentworth
19d
33
167
Chris Olah’s views on AGI safety
evhub
3y
38
155
A transparency and interpretability tech tree
evhub
6mo
10
146
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers
lifelonglearner
1y
16
134
Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc
johnswentworth
6mo
52
114
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
113
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
102
Search versus design
Alex Flint
2y
41
88
A Longlist of Theories of Impact for Interpretability
Neel Nanda
9mo
29
73
Polysemanticity and Capacity in Neural Networks
Buck
2mo
9
69
Engineering Monosemanticity in Toy Models
Adam Jermyn
1mo
6
64
An Analytic Perspective on AI Alignment
DanielFilan
2y
45
63
How Do Selection Theorems Relate To Interpretability?
johnswentworth
6mo
14
85
A positive case for how we might succeed at prosaic AI alignment
evhub
1y
47
65
Solving the whole AGI control problem, version 0.0001
Steven Byrnes
1y
7
34
Pessimism About Unknown Unknowns Inspires Conservatism
michaelcohen
2y
2
24
Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios
Evan R. Murphy
7mo
0
11
An Open Agency Architecture for Safe Transformative AI
davidad
11h
11