Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
106 posts
Careers
Infra-Bayesianism
SERI MATS
Formal Proof
Domain Theory
Distributional Shifts
79 posts
Audio
Interviews
Organization Updates
Redwood Research
AXRP
Adversarial Examples
Adversarial Training
AI Robustness
5
What about non-degree seeking?
Lao Mein
3d
5
6
[ASoT] Reflectivity in Narrow AI
Ulisse Mini
29d
1
30
Where to be an AI Safety Professor
scasper
13d
12
55
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
15
Is the "Valley of Confused Abstractions" real?
jacquesthibs
15d
9
16
Vanessa Kosoy's PreDCA, distilled
Martín Soto
1mo
17
98
Infra-Bayesian physicalism: a formal theory of naturalized induction
Vanessa Kosoy
1y
20
117
Taking the parameters which seem to matter and rotating them until they don't
Garrett Baker
3mo
48
15
Guardian AI (Misaligned systems are all around us.)
Jessica Mary
25d
6
68
Neural Tangent Kernel Distillation
Thomas Larsen
2mo
20
28
Why I'm Working On Model Agnostic Interpretability
Jessica Mary
1mo
9
67
Career Scouting: Dentistry
koratkar
1mo
5
7
Working towards AI alignment is better
Johannes C. Mayer
11d
2
22
How do you get a job as a software developer?
lsusr
4mo
24
130
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
24
Latent Adversarial Training
Adam Jermyn
5mo
9
134
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
46
How and why to turn everything into audio
KatWoods
4mo
18
29
Which LessWrong content would you like recorded into audio/podcast form?
Ruby
3mo
11
86
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
37
Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas
Akash
25d
2
135
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
131
Announcing the LessWrong Curated Podcast
Ben Pace
6mo
17
136
High-stakes alignment via adversarial training [Redwood Research report]
dmz
7mo
29
143
Redwood Research’s current project
Buck
1y
29
26
Me (Steve Byrnes) on the “Brain Inspired” podcast
Steven Byrnes
1mo
1
10
AXRP Episode 18 - Concept Extrapolation with Stuart Armstrong
DanielFilan
3mo
1
74
Listen to top LessWrong posts with The Nonlinear Library
KatWoods
1y
27