Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
13 posts
Audio
Adversarial Examples
AXRP
18 posts
Redwood Research
Adversarial Training
AI Robustness
50
AXRP Episode 4 - Risks from Learned Optimization with Evan Hubinger
DanielFilan
1y
10
47
AXRP Episode 10 - AI’s Future and Impacts with Katja Grace
DanielFilan
1y
2
46
AXRP Episode 7 - Side Effects with Victoria Krakovna
DanielFilan
1y
6
41
AXRP Episode 12 - AI Existential Risk with Paul Christiano
DanielFilan
1y
0
37
AXRP Episode 3 - Negotiable Reinforcement Learning with Andrew Critch
DanielFilan
1y
0
36
If I were a well-intentioned AI... I: Image classifier
Stuart_Armstrong
2y
4
32
[AN #62] Are adversarial examples caused by real but imperceptible features?
Rohin Shah
3y
10
28
AXRP Episode 6 - Debate and Imitative Generalization with Beth Barnes
DanielFilan
1y
3
26
AXRP Episode 8 - Assistance Games with Dylan Hadfield-Menell
DanielFilan
1y
1
26
AXRP Episode 11 - Attainable Utility and Power with Alex Turner
DanielFilan
1y
5
18
AXRP Episode 2 - Learning Human Biases with Rohin Shah
DanielFilan
1y
0
18
The Goodhart Game
John_Maxwell
3y
5
16
AXRP Episode 1 - Adversarial Policies with Adam Gleave
DanielFilan
1y
5
170
Redwood Research’s current project
Buck
1y
29
134
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
129
Why I'm excited about Redwood Research's current project
paulfchristiano
1y
6
118
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
106
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
98
High-stakes alignment via adversarial training [Redwood Research report]
dmz
7mo
29
73
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
54
Redwood's Technique-Focused Epistemic Strategy
adamShimi
1y
1
39
Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing
Buck
6mo
0
30
AXRP Episode 15 - Natural Abstractions with John Wentworth
DanielFilan
7mo
1
27
Causal scrubbing: results on a paren balance checker
LawrenceC
17d
0
26
AXRP Episode 14 - Infra-Bayesian Physicalism with Vanessa Kosoy
DanielFilan
8mo
9
26
AXRP Episode 13 - First Principles of AGI Safety with Richard Ngo
DanielFilan
8mo
1
18
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
4mo
0