Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

4 posts AXRP Adversarial Examples

23 posts Redwood Research Organization Updates Adversarial Training AI Robustness

12 AXRP Episode 1 - Adversarial Policies with Adam Gleave

DanielFilan

1y

5

27 [AN #62] Are adversarial examples caused by real but imperceptible features?

Rohin Shah

3y

10

13 The Goodhart Game

John_Maxwell

3y

5

35 If I were a well-intentioned AI... I: Image classifier

Stuart_Armstrong

2y

4

130 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

24 Latent Adversarial Training

Adam Jermyn

5mo

9

134 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

86 Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang

1mo

5

135 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

136 High-stakes alignment via adversarial training [Redwood Research report]

dmz

7mo

29

143 Redwood Research’s current project

Buck

1y

29

48 Redwood's Technique-Focused Epistemic Strategy

adamShimi

1y

1

16 AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

DanielFilan

4mo

0

34 Help the Brain Preservation Foundation

aurellem

9y

20

48 Get genotyped for free ( If your IQ is high enough)

David Althaus

11y

63

33 Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing

Buck

6mo

0

32 Giving What We Can needs your help!

RobertWiblin

7y

6

57 What I've been doing instead of writing

benkuhn

1y

3