Go Back
Choose this branch
You can't go any further
meritocratic
regular
democratic
hot
top
alive
62 posts
Interviews
Redwood Research
Organization Updates
AXRP
Adversarial Examples
Adversarial Training
AI Robustness
17 posts
Audio
130
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
134
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
26
Causal scrubbing: results on a paren balance checker
LawrenceC
17d
0
86
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
135
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
16
Causal scrubbing: Appendix
LawrenceC
17d
0
136
High-stakes alignment via adversarial training [Redwood Research report]
dmz
7mo
29
143
Redwood Research’s current project
Buck
1y
29
112
Why I'm excited about Redwood Research's current project
paulfchristiano
1y
6
25
Ethan Perez on the Inverse Scaling Prize, Language Feedback and Red Teaming
Michaël Trazzi
3mo
0
110
I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead
lsusr
1y
33
33
Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing
Buck
6mo
0
32
AXRP Episode 15 - Natural Abstractions with John Wentworth
DanielFilan
7mo
1
16
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
4mo
0
6
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic
Akash
2h
0
37
Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas
Akash
25d
2
131
Announcing the LessWrong Curated Podcast
Ben Pace
6mo
17
14
Interview with Matt Freeman
Evenflair
29d
0
26
Me (Steve Byrnes) on the “Brain Inspired” podcast
Steven Byrnes
1mo
1
31
Shahar Avin On How To Regulate Advanced AI Systems
Michaël Trazzi
2mo
0
46
How and why to turn everything into audio
KatWoods
4mo
18
29
Which LessWrong content would you like recorded into audio/podcast form?
Ruby
3mo
11
153
Curated conversations with brilliant rationalists
spencerg
1y
18
13
Podcasts on surveys, slower AI, AI arguments, etc
KatjaGrace
3mo
0
74
Listen to top LessWrong posts with The Nonlinear Library
KatWoods
1y
27
39
New: use The Nonlinear Library to listen to the top LessWrong posts of all time
KatWoods
8mo
9
28
Steganography and the CycleGAN - alignment failure case study
Jan Czechowski
6mo
0
12
An Audio Introduction to Nick Bostrom
PeterH
3mo
0