Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
1855 posts
AI
SERI MATS
AI Sentience
Distributional Shifts
AI Robustness
Truthful AI
Adversarial Examples
185 posts
Careers
Audio
Interviews
Infra-Bayesianism
Organization Updates
AXRP
Formal Proof
Redwood Research
Domain Theory
Adversarial Training
62
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
37
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
21
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
19h
0
153
The next decades might be wild
Marius Hobbhahn
5d
21
232
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
92
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
3
I believe some AI doomers are overconfident
FTPickle
6h
4
265
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
11
Solution to The Alignment Problem
Algon
1d
0
92
Revisiting algorithmic progress
Tamay
7d
6
18
Event [Berkeley]: Alignment Collaborator Speed-Meeting
AlexMennen
1d
2
83
Okay, I feel it now
g1
7d
14
159
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
59
Predicting GPU performance
Marius Hobbhahn
6d
24
6
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic
Akash
2h
0
55
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
2
Career Scouting: Housing Coordination
koratkar
5h
0
130
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
30
Where to be an AI Safety Professor
scasper
13d
12
134
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
67
Career Scouting: Dentistry
koratkar
1mo
5
5
What about non-degree seeking?
Lao Mein
3d
5
26
Causal scrubbing: results on a paren balance checker
LawrenceC
17d
0
37
Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas
Akash
25d
2
86
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
135
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
114
Understanding Infra-Bayesianism: A Beginner-Friendly Video Series
Jack Parker
2mo
6
15
Is the "Valley of Confused Abstractions" real?
jacquesthibs
15d
9