Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

49 posts Conjecture (org) Refine Project Announcement Encultured AI (org)

63 posts Language Models Anthropic Exploratory Engineering Transformer Circuits Transformers

213 Mysteries of mode collapse

janus

1mo

35

186 We Are Conjecture, A New Alignment Research Startup

Connor Leahy

8mo

24

183 Conjecture: a retrospective after 8 months of work

Connor Leahy

27d

9

176 Connor Leahy on Dying with Dignity, EleutherAI and Conjecture

Michaël Trazzi

5mo

29

123 Refine: An Incubator for Conceptual Alignment Research Bets

adamShimi

8mo

13

103 What I Learned Running Refine

adamShimi

26d

5

103 Announcing Encultured AI: Building a Video Game

Andrew_Critch

4mo

26

103 Understanding Conjecture: Notes from Connor Leahy interview

Akash

3mo

24

85 Conjecture Second Hiring Round

Connor Leahy

27d

0

82 Current themes in mechanistic interpretability research

Lee Sharkey

1mo

3

80 [Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey

7d

10

78 How to Diversify Conceptual Alignment: the Model Behind Refine

adamShimi

5mo

11

64 Searching for Search

NicholasKees

22d

6

64 Announcing the Vitalik Buterin Fellowships in AI Existential Safety!

DanielFilan

1y

2

472 Simulators

janus

3mo

103

223 New Scaling Laws for Large Language Models

1a3orn

8mo

21

187 The case for aligning narrowly superhuman models

Ajeya Cotra

1y

74

164 Language models seem to be much better than humans at next-token prediction

Buck

4mo

56

142 Transformer Circuits

evhub

12mo

4

123 Did ChatGPT just gaslight me?

ThomasW

19d

45

112 Who models the models that model models? An exploration of GPT-3's in-context model fitting ability

Lovre

6mo

14

110 GPT-3 Catching Fish in Morse Code

Megan Kinniment

5mo

27

103 Testing PaLM prompts on GPT3

Yitz

8mo

15

96 Paper: Teaching GPT3 to express uncertainty in words

Owain_Evans

6mo

7

94 Help ARC evaluate capabilities of current language models (still need people)

Beth Barnes

5mo

6

90 RL with KL penalties is better seen as Bayesian inference

Tomek Korbak

6mo

15

88 Inverse Scaling Prize: Round 1 Winners

Ethan Perez

2mo

16

84 A one-question Turing test for GPT-3

Paul Crowley

11mo

23