Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
2237 posts
AI
AI Timelines
AI Takeoff
Careers
Audio
Infra-Bayesianism
DeepMind
Interviews
SERI MATS
Dialogue (format)
Agent Foundations
Redwood Research
358 posts
Iterated Amplification
Myopia
Factored Cognition
Humans Consulting HCH
Corrigibility
Interpretability (ML & AI)
Debate (AI safety technique)
Experiments
Self Fulfilling/Refuting Prophecies
Ought
Orthogonality Thesis
Instrumental Convergence
531
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
3mo
83
436
How To Get Into Independent Research On Alignment/Agency
johnswentworth
1y
33
432
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
404
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
394
Why I think strong general AI is coming soon
porby
2mo
126
373
We Choose To Align AI
johnswentworth
11mo
15
332
Two-year update on my personal AI timelines
Ajeya Cotra
4mo
60
331
What should you change in response to an "emergency"? And AI risk
AnnaSalamon
5mo
60
323
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
314
Why Agent Foundations? An Overly Abstract Explanation
johnswentworth
9mo
54
310
Are we in an AI overhang?
Andy Jones
2y
109
291
Fun with +12 OOMs of Compute
Daniel Kokotajlo
1y
78
287
Don't die with dignity; instead play to your outs
Jeffrey Ladish
8mo
58
282
AGI Safety FAQ / all-dumb-questions-allowed thread
Aryeh Englander
6mo
514
446
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
250
The Plan - 2022 Update
johnswentworth
19d
33
239
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More
Ben Pace
3y
60
235
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
22d
27
208
Sorting Pebbles Into Correct Heaps
Eliezer Yudkowsky
14y
109
201
Interpreting Neural Networks through the Polytope Lens
Sid Black
2mo
26
194
Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout
3y
38
184
Godzilla Strategies
johnswentworth
6mo
65
164
Understanding “Deep Double Descent”
evhub
3y
51
151
Re-Examining LayerNorm
Eric Winsor
19d
8
140
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers
lifelonglearner
1y
16
140
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin
5d
18
139
Paul's research agenda FAQ
zhukeepa
4y
73
134
Circumventing interpretability: How to defeat mind-readers
Lee Sharkey
5mo
8