Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
13 posts
Audio
Adversarial Examples
AXRP
18 posts
Redwood Research
Adversarial Training
AI Robustness
36
AXRP Episode 12 - AI Existential Risk with Paul Christiano
DanielFilan
1y
0
34
AXRP Episode 10 - AI’s Future and Impacts with Katja Grace
DanielFilan
1y
2
41
AXRP Episode 4 - Risks from Learned Optimization with Evan Hubinger
DanielFilan
1y
10
34
AXRP Episode 7 - Side Effects with Victoria Krakovna
DanielFilan
1y
6
19
AXRP Episode 11 - Attainable Utility and Power with Alex Turner
DanielFilan
1y
5
22
AXRP Episode 8 - Assistance Games with Dylan Hadfield-Menell
DanielFilan
1y
1
24
AXRP Episode 6 - Debate and Imitative Generalization with Beth Barnes
DanielFilan
1y
3
26
AXRP Episode 3 - Negotiable Reinforcement Learning with Andrew Critch
DanielFilan
1y
0
35
If I were a well-intentioned AI... I: Image classifier
Stuart_Armstrong
2y
4
27
[AN #62] Are adversarial examples caused by real but imperceptible features?
Rohin Shah
3y
10
13
AXRP Episode 2 - Learning Human Biases with Rohin Shah
DanielFilan
1y
0
12
AXRP Episode 1 - Adversarial Policies with Adam Gleave
DanielFilan
1y
5
13
The Goodhart Game
John_Maxwell
3y
5
130
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
134
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
26
Causal scrubbing: results on a paren balance checker
LawrenceC
17d
0
86
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
135
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
16
Causal scrubbing: Appendix
LawrenceC
17d
0
136
High-stakes alignment via adversarial training [Redwood Research report]
dmz
7mo
29
143
Redwood Research’s current project
Buck
1y
29
112
Why I'm excited about Redwood Research's current project
paulfchristiano
1y
6
33
Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing
Buck
6mo
0
32
AXRP Episode 15 - Natural Abstractions with John Wentworth
DanielFilan
7mo
1
16
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
4mo
0
24
Latent Adversarial Training
Adam Jermyn
5mo
9
48
Redwood's Technique-Focused Epistemic Strategy
adamShimi
1y
1