Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
13 posts
Audio
Adversarial Examples
AXRP
18 posts
Redwood Research
Adversarial Training
AI Robustness
41
AXRP Episode 12 - AI Existential Risk with Paul Christiano
DanielFilan
1y
0
47
AXRP Episode 10 - AI’s Future and Impacts with Katja Grace
DanielFilan
1y
2
46
AXRP Episode 7 - Side Effects with Victoria Krakovna
DanielFilan
1y
6
50
AXRP Episode 4 - Risks from Learned Optimization with Evan Hubinger
DanielFilan
1y
10
26
AXRP Episode 11 - Attainable Utility and Power with Alex Turner
DanielFilan
1y
5
37
AXRP Episode 3 - Negotiable Reinforcement Learning with Andrew Critch
DanielFilan
1y
0
26
AXRP Episode 8 - Assistance Games with Dylan Hadfield-Menell
DanielFilan
1y
1
28
AXRP Episode 6 - Debate and Imitative Generalization with Beth Barnes
DanielFilan
1y
3
36
If I were a well-intentioned AI... I: Image classifier
Stuart_Armstrong
2y
4
18
AXRP Episode 2 - Learning Human Biases with Rohin Shah
DanielFilan
1y
0
32
[AN #62] Are adversarial examples caused by real but imperceptible features?
Rohin Shah
3y
10
16
AXRP Episode 1 - Adversarial Policies with Adam Gleave
DanielFilan
1y
5
18
The Goodhart Game
John_Maxwell
3y
5
106
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
118
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
27
Causal scrubbing: results on a paren balance checker
LawrenceC
17d
0
73
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
134
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
15
Causal scrubbing: Appendix
LawrenceC
17d
0
98
High-stakes alignment via adversarial training [Redwood Research report]
dmz
7mo
29
170
Redwood Research’s current project
Buck
1y
29
129
Why I'm excited about Redwood Research's current project
paulfchristiano
1y
6
39
Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing
Buck
6mo
0
18
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
4mo
0
30
AXRP Episode 15 - Natural Abstractions with John Wentworth
DanielFilan
7mo
1
54
Redwood's Technique-Focused Epistemic Strategy
adamShimi
1y
1
18
AXRP Episode 16 - Preparing for Debate AI with Geoffrey Irving
DanielFilan
5mo
0