Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

49 posts Conjecture (org) Refine Project Announcement Encultured AI (org)

63 posts Language Models Anthropic Exploratory Engineering Transformer Circuits Transformers

59 [Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey

7d

10

132 Conjecture: a retrospective after 8 months of work

Connor Leahy

27d

9

103 What I Learned Running Refine

adamShimi

26d

5

164 Mysteries of mode collapse

janus

1mo

35

22 Tradeoffs in complexity, abstraction, and generality

remember

8d

0

42 The First Filter

adamShimi

24d

5

45 Conjecture Second Hiring Round

Connor Leahy

27d

0

25 Searching for Search

NicholasKees

22d

6

34 Current themes in mechanistic interpretability research

Lee Sharkey

1mo

3

101 Understanding Conjecture: Notes from Connor Leahy interview

Akash

3mo

24

100 Announcing Encultured AI: Building a Video Game

Andrew_Critch

4mo

26

118 Connor Leahy on Dying with Dignity, EleutherAI and Conjecture

Michaël Trazzi

5mo

29

12 Good Futures Initiative: Winter Project Internship

Aris

23d

4

22 Embedding safety in ML development

zeshen

1mo

1

27 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

26 Take 11: "Aligning language models" should be weirder.

Charlie Steiner

2d

0

55 A brainteaser for language models

Adam Scherlis

8d

3

38 Discovering Latent Knowledge in Language Models Without Supervision

Xodarap

6d

1

98 Did ChatGPT just gaslight me?

ThomasW

19d

45

3 Will research in AI risk jinx it? Consequences of training AI on AI risk arguments

Yann Dubois

1d

6

16 An exploration of GPT-2's embedding weights

Adam Scherlis

7d

2

23 Does a LLM have a utility function?

Dagon

11d

6

17 Shh, don't tell the AI it's likely to be evil

naterush

14d

9

136 Simulators

janus

3mo

103

155 Language models seem to be much better than humans at next-token prediction

Buck

4mo

56

71 Inverse Scaling Prize: Round 1 Winners

Ethan Perez

2mo

16

17 Gliders in Language Models

Alexandre Variengien

25d

11

63 Toy Models of Superposition

evhub

3mo

2