Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
55 posts
Language Models
Agency
Deconfusion
Scaling Laws
Tool AI
Definitions
Simulation Hypothesis
PaLM
Prompt Engineering
Philosophy of Language
Carving / Clustering Reality
Astronomical Waste
33 posts
Conjecture (org)
Refine
Project Announcement
Encultured AI (org)
Analogy
26
Discovering Language Model Behaviors with Model-Written Evaluations
evhub
4h
3
759
Simulators
janus
3mo
103
26
Inverse scaling can become U-shaped
Edouard Harris
1mo
15
64
Paper: Large Language Models Can Self-improve [Linkpost]
Evan R. Murphy
2mo
14
165
Language models seem to be much better than humans at next-token prediction
Buck
4mo
56
101
Inverse Scaling Prize: Round 1 Winners
Ethan Perez
2mo
16
49
Smoke without fire is scary
Adam Jermyn
2mo
22
18
Beware over-use of the agent model
Alex Flint
1y
10
21
A Test for Language Model Consciousness
Ethan Perez
3mo
14
494
chinchilla's wild implications
nostalgebraist
4mo
114
51
Vingean Agency
abramdemski
3mo
13
58
Conditioning Generative Models for Alignment
Jozdien
5mo
8
22
Conditioning Generative Models
Adam Jermyn
5mo
18
21
Disentangling inner alignment failures
Erik Jenner
2mo
5
96
[Interim research report] Taking features out of superposition with sparse autoencoders
Lee Sharkey
7d
10
222
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
22d
27
248
Mysteries of mode collapse
janus
1mo
35
223
Conjecture: a retrospective after 8 months of work
Connor Leahy
27d
9
98
What I Learned Running Refine
adamShimi
26d
5
190
Interpreting Neural Networks through the Polytope Lens
Sid Black
2mo
26
123
Current themes in mechanistic interpretability research
Lee Sharkey
1mo
3
56
My Thoughts on the ML Safety Course
zeshen
2mo
3
127
Circumventing interpretability: How to defeat mind-readers
Lee Sharkey
5mo
8
37
the Insulated Goal-Program idea
carado
4mo
3
34
Encultured AI Pre-planning, Part 2: Providing a Service
Andrew_Critch
4mo
4
101
Announcing Encultured AI: Building a Video Game
Andrew_Critch
4mo
26
83
Abstracting The Hardness of Alignment: Unbounded Atomic Optimization
adamShimi
4mo
3
88
How to Diversify Conceptual Alignment: the Model Behind Refine
adamShimi
5mo
11