Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
4 posts
AXRP
Adversarial Examples
23 posts
Redwood Research
Organization Updates
Adversarial Training
AI Robustness
12
AXRP Episode 1 - Adversarial Policies with Adam Gleave
DanielFilan
1y
5
27
[AN #62] Are adversarial examples caused by real but imperceptible features?
Rohin Shah
3y
10
13
The Goodhart Game
John_Maxwell
3y
5
35
If I were a well-intentioned AI... I: Image classifier
Stuart_Armstrong
2y
4
130
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
24
Latent Adversarial Training
Adam Jermyn
5mo
9
134
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
86
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
135
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
136
High-stakes alignment via adversarial training [Redwood Research report]
dmz
7mo
29
143
Redwood Research’s current project
Buck
1y
29
48
Redwood's Technique-Focused Epistemic Strategy
adamShimi
1y
1
16
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
4mo
0
34
Help the Brain Preservation Foundation
aurellem
9y
20
48
Get genotyped for free ( If your IQ is high enough)
David Althaus
11y
63
33
Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing
Buck
6mo
0
32
Giving What We Can needs your help!
RobertWiblin
7y
6
57
What I've been doing instead of writing
benkuhn
1y
3