Tree of Tags

Go Back

Choose this branch

You can't go any further

meritocratic regular democratic

hot top alive

62 posts Interviews Redwood Research Organization Updates AXRP Adversarial Examples Adversarial Training AI Robustness

17 posts Audio

164 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

159 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

105 Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang

1mo

5

26 Causal scrubbing: results on a paren balance checker

LawrenceC

17d

0

143 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

18 Causal scrubbing: Appendix

LawrenceC

17d

0

184 High-stakes alignment via adversarial training [Redwood Research report]

dmz

7mo

29

31 Ethan Perez on the Inverse Scaling Prize, Language Feedback and Red Teaming

Michaël Trazzi

3mo

0

121 Redwood Research’s current project

Buck

1y

29

98 Why I'm excited about Redwood Research's current project

paulfchristiano

1y

6

108 I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead

lsusr

1y

33

31 Latent Adversarial Training

Adam Jermyn

5mo

9

36 AXRP Episode 15 - Natural Abstractions with John Wentworth

DanielFilan

7mo

1

28 Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing

Buck

6mo

0

5 Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic

Akash

2h

0

42 Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas

Akash

25d

2

156 Announcing the LessWrong Curated Podcast

Ben Pace

6mo

17

26 Me (Steve Byrnes) on the “Brain Inspired” podcast

Steven Byrnes

1mo

1

12 Interview with Matt Freeman

Evenflair

29d

0

37 Shahar Avin On How To Regulate Advanced AI Systems

Michaël Trazzi

2mo

0

43 How and why to turn everything into audio

KatWoods

4mo

18

22 Which LessWrong content would you like recorded into audio/podcast form?

Ruby

3mo

11

165 Curated conversations with brilliant rationalists

spencerg

1y

18

104 Listen to top LessWrong posts with The Nonlinear Library

KatWoods

1y

27

57 New: use The Nonlinear Library to listen to the top LessWrong posts of all time

KatWoods

8mo

9

20 An Audio Introduction to Nick Bostrom

PeterH

3mo

0

35 Steganography and the CycleGAN - alignment failure case study

Jan Czechowski

6mo

0

13 Podcasts on surveys, slower AI, AI arguments, etc

KatjaGrace

3mo

0