Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
22 posts
Conjecture (org)
Project Announcement
Encultured AI (org)
11 posts
Refine
Analogy
96
[Interim research report] Taking features out of superposition with sparse autoencoders
Lee Sharkey
7d
10
222
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
22d
27
223
Conjecture: a retrospective after 8 months of work
Connor Leahy
27d
9
248
Mysteries of mode collapse
janus
1mo
35
97
Searching for Search
NicholasKees
22d
6
118
Conjecture Second Hiring Round
Connor Leahy
27d
0
98
What I Learned Running Refine
adamShimi
26d
5
123
Current themes in mechanistic interpretability research
Lee Sharkey
1mo
3
190
Interpreting Neural Networks through the Polytope Lens
Sid Black
2mo
26
254
We Are Conjecture, A New Alignment Research Startup
Connor Leahy
8mo
24
101
Announcing Encultured AI: Building a Video Game
Andrew_Critch
4mo
26
127
Circumventing interpretability: How to defeat mind-readers
Lee Sharkey
5mo
8
170
Refine: An Incubator for Conceptual Alignment Research Bets
adamShimi
8mo
13
83
Abstracting The Hardness of Alignment: Unbounded Atomic Optimization
adamShimi
4mo
3
56
My Thoughts on the ML Safety Course
zeshen
2mo
3
25
Embedding safety in ML development
zeshen
1mo
1
32
confusion about alignment requirements
carado
2mo
10
58
I missed the crux of the alignment problem the whole time
zeshen
4mo
7
33
Refine Blogpost Day #3: The shortforms I did write
Alexander Gietelink Oldenziel
3mo
0
25
(Structural) Stability of Coupled Optimizers
Paul Bricman
2mo
0
36
Benchmarking Proposals on Risk Scenarios
Paul Bricman
4mo
2
37
the Insulated Goal-Program idea
carado
4mo
3
23
Refine's Third Blog Post Day/Week
adamShimi
3mo
0
33
Steelmining via Analogy
Paul Bricman
4mo
0
29
goal-program bricks
carado
4mo
2