Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
27 posts
Redwood Research
Organization Updates
Adversarial Examples
Adversarial Training
AI Robustness
35 posts
Interviews
AXRP
96
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
109
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
26
Causal scrubbing: results on a paren balance checker
LawrenceC
17d
0
67
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
127
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
14
Causal scrubbing: Appendix
LawrenceC
17d
0
88
High-stakes alignment via adversarial training [Redwood Research report]
dmz
7mo
29
165
Redwood Research’s current project
Buck
1y
29
126
Why I'm excited about Redwood Research's current project
paulfchristiano
1y
6
38
Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing
Buck
6mo
0
17
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
4mo
0
52
Redwood's Technique-Focused Epistemic Strategy
adamShimi
1y
1
62
We're Redwood Research, we do applied alignment research, AMA
Nate Thomas
1y
3
64
What I've been doing instead of writing
benkuhn
1y
3
112
I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead
lsusr
1y
33
19
Ethan Perez on the Inverse Scaling Prize, Language Feedback and Red Teaming
Michaël Trazzi
3mo
0
28
AXRP Episode 15 - Natural Abstractions with John Wentworth
DanielFilan
7mo
1
73
AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant
DanielFilan
1y
2
18
AXRP Episode 16 - Preparing for Debate AI with Geoffrey Irving
DanielFilan
5mo
0
25
AXRP Episode 14 - Infra-Bayesian Physicalism with Vanessa Kosoy
DanielFilan
8mo
9
39
AXRP Episode 12 - AI Existential Risk with Paul Christiano
DanielFilan
1y
0
25
AXRP Episode 13 - First Principles of AGI Safety with Richard Ngo
DanielFilan
8mo
1
5
Did you enjoy Ramez Naam's "Nexus" trilogy? Check out this interview on neurotech and the law.
fowlertm
2mo
0
8
AXRP Episode 18 - Concept Extrapolation with Stuart Armstrong
DanielFilan
3mo
1
46
AXRP Episode 10 - AI’s Future and Impacts with Katja Grace
DanielFilan
1y
2
45
AXRP Episode 7 - Side Effects with Victoria Krakovna
DanielFilan
1y
6
68
AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah
Palus Astra
2y
27
33
AXRP Episode 7.5 - Forecasting Transformative AI from Biological Anchors with Ajeya Cotra
DanielFilan
1y
1