Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
27 posts
Redwood Research
Organization Updates
Adversarial Examples
Adversarial Training
AI Robustness
35 posts
Interviews
AXRP
96
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
17
Latent Adversarial Training
Adam Jermyn
5mo
9
109
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
67
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
127
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
88
High-stakes alignment via adversarial training [Redwood Research report]
dmz
7mo
29
165
Redwood Research’s current project
Buck
1y
29
52
Redwood's Technique-Focused Epistemic Strategy
adamShimi
1y
1
17
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
4mo
0
16
AXRP Episode 1 - Adversarial Policies with Adam Gleave
DanielFilan
1y
5
43
Help the Brain Preservation Foundation
aurellem
9y
20
61
Get genotyped for free ( If your IQ is high enough)
David Althaus
11y
63
38
Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing
Buck
6mo
0
40
Giving What We Can needs your help!
RobertWiblin
7y
6
8
AXRP Episode 18 - Concept Extrapolation with Stuart Armstrong
DanielFilan
3mo
1
26
deluks917 on Online Weirdos
Jacob Falkovich
4y
3
68
AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah
Palus Astra
2y
27
8
Bloggingheads: Yudkowsky and Horgan
Eliezer Yudkowsky
14y
37
5
Did you enjoy Ramez Naam's "Nexus" trilogy? Check out this interview on neurotech and the law.
fowlertm
2mo
0
34
See Eliezer talk with PZ Myers and David Brin (and me) about immortality this Sunday
Eneasz
9y
5
18
AXRP Episode 2 - Learning Human Biases with Rohin Shah
DanielFilan
1y
0
45
AXRP Episode 7 - Side Effects with Victoria Krakovna
DanielFilan
1y
6
33
AXRP Episode 7.5 - Forecasting Transformative AI from Biological Anchors with Ajeya Cotra
DanielFilan
1y
1
32
GiveWell interview with major SIAI donor Jaan Tallinn
jsalvatier
11y
8
28
AXRP Episode 15 - Natural Abstractions with John Wentworth
DanielFilan
7mo
1
9
BHTV: Jaron Lanier and Yudkowsky
Eliezer Yudkowsky
14y
66
18
BHTV: Yudkowsky / Robert Greene
Eliezer Yudkowsky
13y
24
41
My hour-long interview with Yudkowsky on "Becoming a Rationalist"
lukeprog
11y
22