Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

103 posts Interpretability (ML & AI) Machine Learning (ML) DeepMind Truth, Semantics, & Meaning AI Success Models OpenAI Lottery Ticket Hypothesis Anthropic Conservatism (AI) Honesty Principal-Agent Problems Map and Territory

50 posts GPT Bounties & Prizes (active) AI-assisted Alignment Moore's Law Compute Nanotechnology List of Links AI Safety Public Materials Computer Science Tripwire Quantum Mechanics

364 DeepMind alignment team opinions on AGI ruin arguments

Vika

4mo

34

338 A Mechanistic Interpretability Analysis of Grokking

Neel Nanda

4mo

39

265 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

226 Common misconceptions about OpenAI

Jacob_Hilton

3mo

138

211 The Plan - 2022 Update

johnswentworth

19d

33

197 Chris Olah’s views on AGI safety

evhub

3y

38

146 the scaling “inconsistency”: openAI’s new insight

nostalgebraist

2y

14

139 Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers

lifelonglearner

1y

16

136 A transparency and interpretability tech tree

evhub

6mo

10

135 Understanding “Deep Double Descent”

evhub

3y

51

125 A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]

Dan H

7mo

5

123 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme

Collin

5d

18

118 Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

johnswentworth

6mo

52

111 Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth

4mo

8

223 New Scaling Laws for Large Language Models

1a3orn

8mo

21

187 The case for aligning narrowly superhuman models

Ajeya Cotra

1y

74

158 interpreting GPT: the logit lens

nostalgebraist

2y

32

151 Godzilla Strategies

johnswentworth

6mo

65

140 Developmental Stages of GPTs

orthonormal

2y

74

136 MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"

Rob Bensinger

1y

13

120 Moore's Law, AI, and the pace of progress

Veedrac

1y

39

114 How much chess engine progress is about adapting to bigger computers?

paulfchristiano

1y

23

114 Can you get AGI from a Transformer?

Steven Byrnes

2y

39

111 Alignment As A Bottleneck To Usefulness Of GPT-3

johnswentworth

2y

57

93 [Link] Why I’m optimistic about OpenAI’s alignment approach

janleike

15d

13

92 Beliefs and Disagreements about Automating Alignment Research

Ian McKenzie

3mo

4

91 Compute Trends Across Three eras of Machine Learning

Jsevillamol

10mo

13

89 Collection of GPT-3 results

Kaj_Sotala

2y

24