Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

153 posts Interpretability (ML & AI) GPT Machine Learning (ML) DeepMind OpenAI Truth, Semantics, & Meaning AI Success Models Bounties & Prizes (active) AI-assisted Alignment Lottery Ticket Hypothesis Computer Science Honesty

168 posts Conjecture (org) Oracle AI Myopia Language Models Refine Deconfusion Agency AI Boxing (Containment) Deceptive Alignment Scaling Laws Deception Acausal Trade

364 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

338 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

265 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

226 Common misconceptions about OpenAI

Jacob_Hilton

3mo

138

223 New Scaling Laws for Large Language Models

1a3orn

8mo

21

211 The Plan - 2022 Update

johnswentworth

19d

33

197 Chris Olah’s views on AGI safety

evhub

3y

38

187 The case for aligning narrowly superhuman models

Ajeya Cotra

1y

74

158 interpreting GPT: the logit lens

nostalgebraist

2y

32

151 Godzilla Strategies

johnswentworth

6mo

65

146 the scaling “inconsistency”: openAI’s new insight

nostalgebraist

2y

14

140 Developmental Stages of GPTs

orthonormal

2y

74

139 Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers

lifelonglearner

1y

16

136 MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"

Rob Bensinger

1y

13

472 Simulators

janus

3mo

103

364 chinchilla's wild implications

nostalgebraist

4mo

114

291 The Parable of Predict-O-Matic

abramdemski

3y

42

213 Mysteries of mode collapse

janus

1mo

35

186 We Are Conjecture, A New Alignment Research Startup

Connor Leahy

8mo

24

183 Conjecture: a retrospective after 8 months of work

Connor Leahy

27d

9

166 Announcing the Inverse Scaling Prize ($250k Prize Pool)

Ethan Perez

5mo

14

164 Language models seem to be much better than humans at next-token prediction

Buck

4mo

56

159 The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren

22d

27

142 Transformer Circuits

evhub

12mo

4

142 Decision theory does not imply that we get to have nice things

So8res

2mo

53

123 Refine: An Incubator for Conceptual Alignment Research Bets

adamShimi

8mo

13

123 Interpreting Neural Networks through the Polytope Lens

Sid Black

2mo

26

119 Beyond Astronomical Waste

Wei_Dai

4y

41