Tree of Tags

Go Back

Choose this branch

You can't go any further

meritocratic regular democratic

hot top alive

62 posts Language Models Anthropic Transformer Circuits Transformers

1 posts Exploratory Engineering

27 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

32 Take 11: "Aligning language models" should be weirder.

Charlie Steiner

2d

0

7 Will research in AI risk jinx it? Consequences of training AI on AI risk arguments

Yann Dubois

1d

6

52 Discovering Latent Knowledge in Language Models Without Supervision

Xodarap

6d

1

148 Did ChatGPT just gaslight me?

ThomasW

19d

45

808 Simulators

janus

3mo

103

36 An exploration of GPT-2's embedding weights

Adam Scherlis

7d

2

37 A brainteaser for language models

Adam Scherlis

8d

3

21 Shh, don't tell the AI it's likely to be evil

naterush

14d

9

37 Gliders in Language Models

Alexandre Variengien

25d

11

173 Language models seem to be much better than humans at next-token prediction

Buck

4mo

56

105 Inverse Scaling Prize: Round 1 Winners

Ethan Perez

2mo

16

9 Does a LLM have a utility function?

Dagon

11d

6

61 They gave LLMs access to physics simulators

ryan_b

2mo

18

14 von Neumann probes and Dyson spheres: what exploratory engineering can tell us about the Fermi paradox

Stuart_Armstrong

10y

21