Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

153 posts Interpretability (ML & AI) GPT Machine Learning (ML) DeepMind OpenAI Truth, Semantics, & Meaning AI Success Models Bounties & Prizes (active) AI-assisted Alignment Lottery Ticket Hypothesis Computer Science Honesty

168 posts Conjecture (org) Oracle AI Myopia Language Models Refine Deconfusion Agency AI Boxing (Containment) Deceptive Alignment Scaling Laws Deception Acausal Trade

318 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

254 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

223 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

204 The case for aligning narrowly superhuman models

Ajeya Cotra

1y

74

199 Common misconceptions about OpenAI

Jacob_Hilton

3mo

138

191 New Scaling Laws for Large Language Models

1a3orn

8mo

21

183 The Plan - 2022 Update

johnswentworth

19d

33

167 Chris Olah’s views on AGI safety

evhub

3y

38

164 MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"

Rob Bensinger

1y

13

155 A transparency and interpretability tech tree

evhub

6mo

10

146 Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers

lifelonglearner

1y

16

145 interpreting GPT: the logit lens

nostalgebraist

2y

32

135 How much chess engine progress is about adapting to bigger computers?

paulfchristiano

1y

23

134 Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

johnswentworth

6mo

52

258 The Parable of Predict-O-Matic

abramdemski

3y

42

234 chinchilla's wild implications

nostalgebraist

4mo

114

185 Simulators

janus

3mo

103

178 Mysteries of mode collapse

janus

1mo

35

163 Language models seem to be much better than humans at next-token prediction

Buck

4mo

56

157 Transformer Circuits

evhub

12mo

4

145 Decision theory does not imply that we get to have nice things

So8res

2mo

53

143 Conjecture: a retrospective after 8 months of work

Connor Leahy

27d

9

141 Announcing the Inverse Scaling Prize ($250k Prize Pool)

Ethan Perez

5mo

14

119 Monitoring for deceptive alignment

evhub

3mo

7

118 We Are Conjecture, A New Alignment Research Startup

Connor Leahy

8mo

24

113 The case for becoming a black-box investigator of language models

Buck

7mo

19

108 What I Learned Running Refine

adamShimi

26d

5

108 Beyond Astronomical Waste

Wei_Dai

4y

41