Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

27 posts Redwood Research Organization Updates Adversarial Examples Adversarial Training AI Robustness

35 posts Interviews AXRP

96 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

109 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

26 Causal scrubbing: results on a paren balance checker

LawrenceC

17d

0

67 Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang

1mo

5

127 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

14 Causal scrubbing: Appendix

LawrenceC

17d

0

88 High-stakes alignment via adversarial training [Redwood Research report]

dmz

7mo

29

165 Redwood Research’s current project

Buck

1y

29

126 Why I'm excited about Redwood Research's current project

paulfchristiano

1y

6

38 Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing

Buck

6mo

0

17 AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

DanielFilan

4mo

0

52 Redwood's Technique-Focused Epistemic Strategy

adamShimi

1y

1

62 We're Redwood Research, we do applied alignment research, AMA

Nate Thomas

1y

3

64 What I've been doing instead of writing

benkuhn

1y

3

112 I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead

lsusr

1y

33

19 Ethan Perez on the Inverse Scaling Prize, Language Feedback and Red Teaming

Michaël Trazzi

3mo

0

28 AXRP Episode 15 - Natural Abstractions with John Wentworth

DanielFilan

7mo

1

73 AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant

DanielFilan

1y

2

18 AXRP Episode 16 - Preparing for Debate AI with Geoffrey Irving

DanielFilan

5mo

0

25 AXRP Episode 14 - Infra-Bayesian Physicalism with Vanessa Kosoy

DanielFilan

8mo

9

39 AXRP Episode 12 - AI Existential Risk with Paul Christiano

DanielFilan

1y

0

25 AXRP Episode 13 - First Principles of AGI Safety with Richard Ngo

DanielFilan

8mo

1

5 Did you enjoy Ramez Naam's "Nexus" trilogy? Check out this interview on neurotech and the law.

fowlertm

2mo

0

8 AXRP Episode 18 - Concept Extrapolation with Stuart Armstrong

DanielFilan

3mo

1

46 AXRP Episode 10 - AI’s Future and Impacts with Katja Grace

DanielFilan

1y

2

45 AXRP Episode 7 - Side Effects with Victoria Krakovna

DanielFilan

1y

6

68 AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah

Palus Astra

2y

27

33 AXRP Episode 7.5 - Forecasting Transformative AI from Biological Anchors with Ajeya Cotra

DanielFilan

1y

1