Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
57 posts
Language Models
Transformers
5 posts
Anthropic
Transformer Circuits
27
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
4h
3
98
Did ChatGPT just gaslight me?
ThomasW
19d
45
136
Simulators
janus
3mo
103
17
Shh, don't tell the AI it's likely to be evil
naterush
14d
9
-4
Simulators and Mindcrime
DragonGod
11d
4
23
Does a LLM have a utility function?
Dagon
11d
6
17
Gliders in Language Models
Alexandre Variengien
25d
11
55
A brainteaser for language models
Adam Scherlis
8d
3
155
Language models seem to be much better than humans at next-token prediction
Buck
4mo
56
16
An exploration of GPT-2's embedding weights
Adam Scherlis
7d
2
37
Paper: Large Language Models Can Self-improve [Linkpost]
Evan R. Murphy
2mo
14
39
They gave LLMs access to physics simulators
ryan_b
2mo
18
38
Discovering Latent Knowledge in Language Models Without Supervision
Xodarap
6d
1
71
Inverse Scaling Prize: Round 1 Winners
Ethan Perez
2mo
16
3
Will research in AI risk jinx it? Consequences of training AI on AI risk arguments
Yann Dubois
1d
6
63
Toy Models of Superposition
evhub
3mo
2
10
Understanding the tensor product formulation in Transformer Circuits
Tom Lieberum
12mo
2
59
A Summary Of Anthropic's First Paper
Sam Ringer
11mo
0
7
Mechanistic Interpretability for the MLP Layers (rough early thoughts)
MadHatter
12mo
2