Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
27 posts
Redwood Research
Organization Updates
Adversarial Examples
Adversarial Training
AI Robustness
35 posts
Interviews
AXRP
184
High-stakes alignment via adversarial training [Redwood Research report]
dmz
7mo
29
164
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
159
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
143
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
121
Redwood Research’s current project
Buck
1y
29
105
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
98
Why I'm excited about Redwood Research's current project
paulfchristiano
1y
6
50
What I've been doing instead of writing
benkuhn
1y
3
50
We're Redwood Research, we do applied alignment research, AMA
Nate Thomas
1y
3
45
Two clarifications about "Strategic Background"
Rob Bensinger
4y
6
44
Redwood's Technique-Focused Epistemic Strategy
adamShimi
1y
1
37
Genomic Prediction is now offering embryo selection
gwern
4y
1
35
Get genotyped for free ( If your IQ is high enough)
David Althaus
11y
63
35
If I were a well-intentioned AI... I: Image classifier
Stuart_Armstrong
2y
4
108
I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead
lsusr
1y
33
97
[Transcript] Richard Feynman on Why Questions
Grognor
10y
45
51
Conversation with Paul Christiano
abergal
3y
6
48
AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah
Palus Astra
2y
27
39
AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant
DanielFilan
1y
2
36
AXRP Episode 15 - Natural Abstractions with John Wentworth
DanielFilan
7mo
1
33
AXRP Episode 12 - AI Existential Risk with Paul Christiano
DanielFilan
1y
0
31
Ethan Perez on the Inverse Scaling Prize, Language Feedback and Red Teaming
Michaël Trazzi
3mo
0
28
Robin Hanson on the futurist focus on AI
abergal
3y
24
25
My hour-long interview with Yudkowsky on "Becoming a Rationalist"
lukeprog
11y
22
25
Muehlhauser-Wang Dialogue
lukeprog
10y
288
23
AXRP Episode 7 - Side Effects with Victoria Krakovna
DanielFilan
1y
6
23
AXRP Episode 13 - First Principles of AGI Safety with Richard Ngo
DanielFilan
8mo
1
22
deluks917 on Online Weirdos
Jacob Falkovich
4y
3