Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
2237 posts
AI
AI Timelines
AI Takeoff
Careers
Audio
Infra-Bayesianism
DeepMind
Interviews
SERI MATS
Dialogue (format)
Agent Foundations
Redwood Research
358 posts
Iterated Amplification
Myopia
Factored Cognition
Humans Consulting HCH
Corrigibility
Interpretability (ML & AI)
Debate (AI safety technique)
Experiments
Self Fulfilling/Refuting Prophecies
Ought
Orthogonality Thesis
Instrumental Convergence
364
DeepMind alignment team opinions on AGI ruin arguments
Vika
4mo
34
344
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
3mo
83
314
How To Get Into Independent Research On Alignment/Agency
johnswentworth
1y
33
310
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
5mo
89
303
What should you change in response to an "emergency"? And AI risk
AnnaSalamon
5mo
60
287
Two-year update on my personal AI timelines
Ajeya Cotra
4mo
60
269
Why I think strong general AI is coming soon
porby
2mo
126
265
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
259
We Choose To Align AI
johnswentworth
11mo
15
255
Are we in an AI overhang?
Andy Jones
2y
109
247
Why Agent Foundations? An Overly Abstract Explanation
johnswentworth
9mo
54
247
DeepMind: Generally capable agents emerge from open-ended play
Daniel Kokotajlo
1y
53
245
Visible Thoughts Project and Bounty Announcement
So8res
1y
104
243
Don't die with dignity; instead play to your outs
Jeffrey Ladish
8mo
58
338
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
4mo
39
211
The Plan - 2022 Update
johnswentworth
19d
33
205
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More
Ben Pace
3y
60
170
Sorting Pebbles Into Correct Heaps
Eliezer Yudkowsky
14y
109
159
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
22d
27
153
Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout
3y
38
151
Godzilla Strategies
johnswentworth
6mo
65
144
Self-fulfilling correlations
PhilGoetz
12y
50
139
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers
lifelonglearner
1y
16
136
MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"
Rob Bensinger
1y
13
136
A transparency and interpretability tech tree
evhub
6mo
10
135
Understanding “Deep Double Descent”
evhub
3y
51
125
Paul's research agenda FAQ
zhukeepa
4y
73
125
Debate update: Obfuscated arguments problem
Beth Barnes
1y
21