Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

153 posts Interpretability (ML & AI) GPT Machine Learning (ML) DeepMind OpenAI Truth, Semantics, & Meaning AI Success Models Bounties & Prizes (active) AI-assisted Alignment Lottery Ticket Hypothesis Computer Science Honesty

168 posts Conjecture (org) Oracle AI Myopia Language Models Refine Deconfusion Agency AI Boxing (Containment) Deceptive Alignment Scaling Laws Deception Acausal Trade

422 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

410 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

307 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

255 New Scaling Laws for Large Language Models

1a3orn

8mo

21

253 Common misconceptions about OpenAI

Jacob_Hilton

3mo

138

239 The Plan - 2022 Update

johnswentworth

19d

33

227 Chris Olah’s views on AGI safety

evhub

3y

38

175 Godzilla Strategies

johnswentworth

6mo

65

173 A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]

Dan H

7mo

5

171 interpreting GPT: the logit lens

nostalgebraist

2y

32

170 The case for aligning narrowly superhuman models

Ajeya Cotra

1y

74

169 the scaling “inconsistency”: openAI’s new insight

nostalgebraist

2y

14

156 Understanding “Deep Double Descent”

evhub

3y

51

155 Developmental Stages of GPTs

orthonormal

2y

74

759 Simulators

janus

3mo

103

494 chinchilla's wild implications

nostalgebraist

4mo

114

324 The Parable of Predict-O-Matic

abramdemski

3y

42

254 We Are Conjecture, A New Alignment Research Startup

Connor Leahy

8mo

24

248 Mysteries of mode collapse

janus

1mo

35

223 Conjecture: a retrospective after 8 months of work

Connor Leahy

27d

9

222 The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren

22d

27

191 Announcing the Inverse Scaling Prize ($250k Prize Pool)

Ethan Perez

5mo

14

190 Interpreting Neural Networks through the Polytope Lens

Sid Black

2mo

26

170 Refine: An Incubator for Conceptual Alignment Research Bets

adamShimi

8mo

13

165 Language models seem to be much better than humans at next-token prediction

Buck

4mo

56

140 Who models the models that model models? An exploration of GPT-3's in-context model fitting ability

Lovre

6mo

14

139 Decision theory does not imply that we get to have nice things

So8res

2mo

53

130 Beyond Astronomical Waste

Wei_Dai

4y

41