Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
49 posts
Conjecture (org)
Refine
Project Announcement
Encultured AI (org)
63 posts
Language Models
Anthropic
Exploratory Engineering
Transformer Circuits
Transformers
59
[Interim research report] Taking features out of superposition with sparse autoencoders
Lee Sharkey
7d
10
132
Conjecture: a retrospective after 8 months of work
Connor Leahy
27d
9
103
What I Learned Running Refine
adamShimi
26d
5
164
Mysteries of mode collapse
janus
1mo
35
22
Tradeoffs in complexity, abstraction, and generality
remember
8d
0
42
The First Filter
adamShimi
24d
5
45
Conjecture Second Hiring Round
Connor Leahy
27d
0
25
Searching for Search
NicholasKees
22d
6
34
Current themes in mechanistic interpretability research
Lee Sharkey
1mo
3
101
Understanding Conjecture: Notes from Connor Leahy interview
Akash
3mo
24
100
Announcing Encultured AI: Building a Video Game
Andrew_Critch
4mo
26
118
Connor Leahy on Dying with Dignity, EleutherAI and Conjecture
Michaƫl Trazzi
5mo
29
12
Good Futures Initiative: Winter Project Internship
Aris
23d
4
22
Embedding safety in ML development
zeshen
1mo
1
27
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
4h
3
26
Take 11: "Aligning language models" should be weirder.
Charlie Steiner
2d
0
55
A brainteaser for language models
Adam Scherlis
8d
3
38
Discovering Latent Knowledge in Language Models Without Supervision
Xodarap
6d
1
98
Did ChatGPT just gaslight me?
ThomasW
19d
45
3
Will research in AI risk jinx it? Consequences of training AI on AI risk arguments
Yann Dubois
1d
6
16
An exploration of GPT-2's embedding weights
Adam Scherlis
7d
2
23
Does a LLM have a utility function?
Dagon
11d
6
17
Shh, don't tell the AI it's likely to be evil
naterush
14d
9
136
Simulators
janus
3mo
103
155
Language models seem to be much better than humans at next-token prediction
Buck
4mo
56
71
Inverse Scaling Prize: Round 1 Winners
Ethan Perez
2mo
16
17
Gliders in Language Models
Alexandre Variengien
25d
11
63
Toy Models of Superposition
evhub
3mo
2