Tree of Tags

Go Back

Choose this branch

You can't go any further

meritocratic regular democratic

hot top alive

62 posts Interviews Redwood Research Organization Updates AXRP Adversarial Examples Adversarial Training AI Robustness

17 posts Audio

130 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

24 Latent Adversarial Training

Adam Jermyn

5mo

9

134 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

86 Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang

1mo

5

135 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

136 High-stakes alignment via adversarial training [Redwood Research report]

dmz

7mo

29

143 Redwood Research’s current project

Buck

1y

29

10 AXRP Episode 18 - Concept Extrapolation with Stuart Armstrong

DanielFilan

3mo

1

48 Redwood's Technique-Focused Epistemic Strategy

adamShimi

1y

1

16 AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

DanielFilan

4mo

0

24 deluks917 on Online Weirdos

Jacob Falkovich

4y

3

58 AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah

Palus Astra

2y

27

12 AXRP Episode 1 - Adversarial Policies with Adam Gleave

DanielFilan

1y

5

7 Bloggingheads: Yudkowsky and Horgan

Eliezer Yudkowsky

14y

37

46 How and why to turn everything into audio

KatWoods

4mo

18

29 Which LessWrong content would you like recorded into audio/podcast form?

Ruby

3mo

11

37 Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas

Akash

25d

2

131 Announcing the LessWrong Curated Podcast

Ben Pace

6mo

17

26 Me (Steve Byrnes) on the “Brain Inspired” podcast

Steven Byrnes

1mo

1

74 Listen to top LessWrong posts with The Nonlinear Library

KatWoods

1y

27

6 Cognitive scientist Joel Chan on metascience, scaling and automating innovation, collective intelligence, and tools for thought.

fowlertm

1y

3

13 Podcasts on surveys, slower AI, AI arguments, etc

KatjaGrace

3mo

0

41 AXRP Episode 4 - Risks from Learned Optimization with Evan Hubinger

DanielFilan

1y

10

14 Interview with Matt Freeman

Evenflair

29d

0

31 Shahar Avin On How To Regulate Advanced AI Systems

Michaël Trazzi

2mo

0

26 Feelings of Admiration, Ruby <=> Miranda

Ruby

1y

0

39 New: use The Nonlinear Library to listen to the top LessWrong posts of all time

KatWoods

8mo

9

153 Curated conversations with brilliant rationalists

spencerg

1y

18