Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

27 posts Redwood Research Organization Updates Adversarial Examples Adversarial Training AI Robustness

35 posts Interviews AXRP

130 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

24 Latent Adversarial Training

Adam Jermyn

5mo

9

134 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

86 Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang

1mo

5

135 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

136 High-stakes alignment via adversarial training [Redwood Research report]

dmz

7mo

29

143 Redwood Research’s current project

Buck

1y

29

48 Redwood's Technique-Focused Epistemic Strategy

adamShimi

1y

1

16 AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

DanielFilan

4mo

0

12 AXRP Episode 1 - Adversarial Policies with Adam Gleave

DanielFilan

1y

5

34 Help the Brain Preservation Foundation

aurellem

9y

20

48 Get genotyped for free ( If your IQ is high enough)

David Althaus

11y

63

33 Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing

Buck

6mo

0

32 Giving What We Can needs your help!

RobertWiblin

7y

6

10 AXRP Episode 18 - Concept Extrapolation with Stuart Armstrong

DanielFilan

3mo

1

24 deluks917 on Online Weirdos

Jacob Falkovich

4y

3

58 AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah

Palus Astra

2y

27

7 Bloggingheads: Yudkowsky and Horgan

Eliezer Yudkowsky

14y

37

5 Did you enjoy Ramez Naam's "Nexus" trilogy? Check out this interview on neurotech and the law.

fowlertm

2mo

0

26 See Eliezer talk with PZ Myers and David Brin (and me) about immortality this Sunday

Eneasz

9y

5

13 AXRP Episode 2 - Learning Human Biases with Rohin Shah

DanielFilan

1y

0

34 AXRP Episode 7 - Side Effects with Victoria Krakovna

DanielFilan

1y

6

24 AXRP Episode 7.5 - Forecasting Transformative AI from Biological Anchors with Ajeya Cotra

DanielFilan

1y

1

25 GiveWell interview with major SIAI donor Jaan Tallinn

jsalvatier

11y

8

32 AXRP Episode 15 - Natural Abstractions with John Wentworth

DanielFilan

7mo

1

8 BHTV: Jaron Lanier and Yudkowsky

Eliezer Yudkowsky

14y

66

16 BHTV: Yudkowsky / Robert Greene

Eliezer Yudkowsky

13y

24

33 My hour-long interview with Yudkowsky on "Becoming a Rationalist"

lukeprog

11y

22