Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

57 posts Language Models Transformers

5 posts Anthropic Transformer Circuits

27 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

98 Did ChatGPT just gaslight me?

ThomasW

19d

45

136 Simulators

janus

3mo

103

17 Shh, don't tell the AI it's likely to be evil

naterush

14d

9

-4 Simulators and Mindcrime

DragonGod

11d

4

23 Does a LLM have a utility function?

Dagon

11d

6

17 Gliders in Language Models

Alexandre Variengien

25d

11

55 A brainteaser for language models

Adam Scherlis

8d

3

155 Language models seem to be much better than humans at next-token prediction

Buck

4mo

56

16 An exploration of GPT-2's embedding weights

Adam Scherlis

7d

2

37 Paper: Large Language Models Can Self-improve [Linkpost]

Evan R. Murphy

2mo

14

39 They gave LLMs access to physics simulators

ryan_b

2mo

18

38 Discovering Latent Knowledge in Language Models Without Supervision

Xodarap

6d

1

71 Inverse Scaling Prize: Round 1 Winners

Ethan Perez

2mo

16

3 Will research in AI risk jinx it? Consequences of training AI on AI risk arguments

Yann Dubois

1d

6

63 Toy Models of Superposition

evhub

3mo

2

10 Understanding the tensor product formulation in Transformer Circuits

Tom Lieberum

12mo

2

59 A Summary Of Anthropic's First Paper

Sam Ringer

11mo

0

7 Mechanistic Interpretability for the MLP Layers (rough early thoughts)

MadHatter

12mo

2