Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
86 posts
Interpretability (ML & AI)
Lottery Ticket Hypothesis
9 posts
AI Success Models
Conservatism (AI)
Market making (AI safety technique)
Verification
446
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
250
The Plan - 2022 Update
johnswentworth
19d
33
235
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
22d
27
201
Interpreting Neural Networks through the Polytope Lens
Sid Black
2mo
26
164
Understanding “Deep Double Descent”
evhub
3y
51
151
Re-Examining LayerNorm
Eric Winsor
19d
8
140
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers
lifelonglearner
1y
16
140
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
134
Circumventing interpretability: How to defeat mind-readers
Lee Sharkey
5mo
8
131
A Longlist of Theories of Impact for Interpretability
Neel Nanda
9mo
29
128
The case for becoming a black-box investigator of language models
Buck
7mo
19
123
A transparency and interpretability tech tree
evhub
6mo
10
114
MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"
Rob Bensinger
1y
13
106
Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc
johnswentworth
6mo
52
113
Conversation with Eliezer: What do you want the system to do?
Akash
5mo
38
81
Various Alignment Strategies (and how likely they are to work)
Logan Zoellner
7mo
34
75
A positive case for how we might succeed at prosaic AI alignment
evhub
1y
47
70
Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios
Evan R. Murphy
7mo
0
57
Solving the whole AGI control problem, version 0.0001
Steven Byrnes
1y
7
29
Pessimism About Unknown Unknowns Inspires Conservatism
michaelcohen
2y
2
26
If AGI were coming in a year, what should we do?
MichaelStJules
8mo
16
17
RFC: Philosophical Conservatism in AI Alignment Research
Gordon Seidoh Worley
4y
13
16
An Open Agency Architecture for Safe Transformative AI
davidad
11h
11