Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

55 posts Language Models Agency Deconfusion Scaling Laws Tool AI Definitions Simulation Hypothesis PaLM Prompt Engineering Philosophy of Language Carving / Clustering Reality Astronomical Waste

33 posts Conjecture (org) Refine Project Announcement Encultured AI (org) Analogy

26 Discovering Language Model Behaviors with Model-Written Evaluations

evhub

4h

3

759 Simulators

janus

3mo

103

26 Inverse scaling can become U-shaped

Edouard Harris

1mo

15

64 Paper: Large Language Models Can Self-improve [Linkpost]

Evan R. Murphy

2mo

14

165 Language models seem to be much better than humans at next-token prediction

Buck

4mo

56

101 Inverse Scaling Prize: Round 1 Winners

Ethan Perez

2mo

16

49 Smoke without fire is scary

Adam Jermyn

2mo

22

18 Beware over-use of the agent model

Alex Flint

1y

10

21 A Test for Language Model Consciousness

Ethan Perez

3mo

14

494 chinchilla's wild implications

nostalgebraist

4mo

114

51 Vingean Agency

abramdemski

3mo

13

58 Conditioning Generative Models for Alignment

Jozdien

5mo

8

22 Conditioning Generative Models

Adam Jermyn

5mo

18

21 Disentangling inner alignment failures

Erik Jenner

2mo

5

96 [Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey

7d

10

222 The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren

22d

27

248 Mysteries of mode collapse

janus

1mo

35

223 Conjecture: a retrospective after 8 months of work

Connor Leahy

27d

9

98 What I Learned Running Refine

adamShimi

26d

5

190 Interpreting Neural Networks through the Polytope Lens

Sid Black

2mo

26

123 Current themes in mechanistic interpretability research

Lee Sharkey

1mo

3

56 My Thoughts on the ML Safety Course

zeshen

2mo

3

127 Circumventing interpretability: How to defeat mind-readers

Lee Sharkey

5mo

8

37 the Insulated Goal-Program idea

carado

4mo

3

34 Encultured AI Pre-planning, Part 2: Providing a Service

Andrew_Critch

4mo

4

101 Announcing Encultured AI: Building a Video Game

Andrew_Critch

4mo

26

83 Abstracting The Hardness of Alignment: Unbounded Atomic Optimization

adamShimi

4mo

3

88 How to Diversify Conceptual Alignment: the Model Behind Refine

adamShimi

5mo

11