Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
103 posts
Interpretability (ML & AI)
Machine Learning (ML)
DeepMind
Truth, Semantics, & Meaning
AI Success Models
OpenAI
Lottery Ticket Hypothesis
Anthropic
Conservatism (AI)
Honesty
Principal-Agent Problems
Map and Territory
50 posts
GPT
Bounties & Prizes (active)
AI-assisted Alignment
Moore's Law
Compute
Nanotechnology
List of Links
AI Safety Public Materials
Computer Science
Tripwire
Quantum Mechanics
364
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
338
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
265
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
226
Common misconceptions about OpenAI
Jacob_Hilton
3mo
138
211
The Plan - 2022 Update
johnswentworth
19d
33
197
Chris Olah’s views on AGI safety
evhub
3y
38
146
the scaling “inconsistency”: openAI’s new insight
nostalgebraist
2y
14
139
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers
lifelonglearner
1y
16
136
A transparency and interpretability tech tree
evhub
6mo
10
135
Understanding “Deep Double Descent”
evhub
3y
51
125
A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]
Dan H
7mo
5
123
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
118
Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc
johnswentworth
6mo
52
111
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
4mo
8
223
New Scaling Laws for Large Language Models
1a3orn
8mo
21
187
The case for aligning narrowly superhuman models
Ajeya Cotra
1y
74
158
interpreting GPT: the logit lens
nostalgebraist
2y
32
151
Godzilla Strategies
johnswentworth
6mo
65
140
Developmental Stages of GPTs
orthonormal
2y
74
136
MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"
Rob Bensinger
1y
13
120
Moore's Law, AI, and the pace of progress
Veedrac
1y
39
114
How much chess engine progress is about adapting to bigger computers?
paulfchristiano
1y
23
114
Can you get AGI from a Transformer?
Steven Byrnes
2y
39
111
Alignment As A Bottleneck To Usefulness Of GPT-3
johnswentworth
2y
57
93
[Link] Why I’m optimistic about OpenAI’s alignment approach
janleike
15d
13
92
Beliefs and Disagreements about Automating Alignment Research
Ian McKenzie
3mo
4
91
Compute Trends Across Three eras of Machine Learning
Jsevillamol
10mo
13
89
Collection of GPT-3 results
Kaj_Sotala
2y
24