Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
620 posts
AI
Eliciting Latent Knowledge (ELK)
AI Robustness
Truthful AI
Autonomy and Choice
Intelligence Explosion
Social Media
Transcripts
114 posts
Infra-Bayesianism
Counterfactuals
Interviews
Audio
Logic & Mathematics
AXRP
Redwood Research
Domain Theory
Counterfactual Mugging
Newcomb's Problem
Formal Proof
Functional Decision Theory
242
Visible Thoughts Project and Bounty Announcement
So8res
1y
104
241
Discussion with Eliezer Yudkowsky on AGI interventions
Rob Bensinger
1y
257
231
DeepMind: Generally capable agents emerge from open-ended play
Daniel Kokotajlo
1y
53
219
ARC's first technical report: Eliciting Latent Knowledge
paulfchristiano
1y
88
217
larger language models may disappoint you [or, an eternally unfinished draft]
nostalgebraist
1y
29
213
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
213
Hiring engineers and researchers to help align GPT-3
paulfchristiano
2y
14
211
Safetywashing
Adam Scholl
5mo
17
205
The Plan
johnswentworth
1y
77
205
Ngo and Yudkowsky on alignment difficulty
Eliezer Yudkowsky
1y
143
202
Attempted Gears Analysis of AGI Intervention Discussion With Eliezer
Zvi
1y
48
194
Announcing the Alignment Research Center
paulfchristiano
1y
6
185
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
3mo
83
177
A note about differential technological development
So8res
5mo
31
170
Redwood Research’s current project
Buck
1y
29
134
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
129
Why I'm excited about Redwood Research's current project
paulfchristiano
1y
6
121
Introduction To The Infra-Bayesianism Sequence
Diffractor
2y
64
118
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
106
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
98
High-stakes alignment via adversarial training [Redwood Research report]
dmz
7mo
29
98
Infra-Bayesian physicalism: a formal theory of naturalized induction
Vanessa Kosoy
1y
20
95
Counterfactual Mugging Poker Game
Scott Garrabrant
4y
2
93
Zoom In: An Introduction to Circuits
evhub
2y
11
75
AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant
DanielFilan
1y
2
73
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
69
Recent Progress in the Theory of Neural Networks
interstice
3y
9
60
MIRI/OP exchange about decision theory
Rob Bensinger
1y
7