Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
22 posts
Conjecture (org)
Project Announcement
Encultured AI (org)
11 posts
Refine
Analogy
80
[Interim research report] Taking features out of superposition with sparse autoencoders
Lee Sharkey
7d
10
159
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
22d
27
183
Conjecture: a retrospective after 8 months of work
Connor Leahy
27d
9
213
Mysteries of mode collapse
janus
1mo
35
103
What I Learned Running Refine
adamShimi
26d
5
85
Conjecture Second Hiring Round
Connor Leahy
27d
0
64
Searching for Search
NicholasKees
22d
6
82
Current themes in mechanistic interpretability research
Lee Sharkey
1mo
3
123
Interpreting Neural Networks through the Polytope Lens
Sid Black
2mo
26
103
Announcing Encultured AI: Building a Video Game
Andrew_Critch
4mo
26
186
We Are Conjecture, A New Alignment Research Startup
Connor Leahy
8mo
24
94
Circumventing interpretability: How to defeat mind-readers
Lee Sharkey
5mo
8
78
How to Diversify Conceptual Alignment: the Model Behind Refine
adamShimi
5mo
11
123
Refine: An Incubator for Conceptual Alignment Research Bets
adamShimi
8mo
13
49
My Thoughts on the ML Safety Course
zeshen
2mo
3
24
Embedding safety in ML development
zeshen
1mo
1
53
I missed the crux of the alignment problem the whole time
zeshen
4mo
7
28
confusion about alignment requirements
carado
2mo
10
25
(Structural) Stability of Coupled Optimizers
Paul Bricman
2mo
0
39
the Insulated Goal-Program idea
carado
4mo
3
23
Refine Blogpost Day #3: The shortforms I did write
Alexander Gietelink Oldenziel
3mo
0
27
goal-program bricks
carado
4mo
2
25
Benchmarking Proposals on Risk Scenarios
Paul Bricman
4mo
2
18
Refine's Third Blog Post Day/Week
adamShimi
3mo
0
24
Steelmining via Analogy
Paul Bricman
4mo
0