Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
1855 posts
AI
SERI MATS
AI Sentience
Distributional Shifts
AI Robustness
Truthful AI
Adversarial Examples
185 posts
Careers
Audio
Interviews
Infra-Bayesianism
Organization Updates
AXRP
Formal Proof
Redwood Research
Domain Theory
Adversarial Training
29
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner
19h
0
33
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
40
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
199
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
108
The next decades might be wild
Marius Hobbhahn
5d
21
15
Solution to The Alignment Problem
Algon
1d
0
95
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
22
Event [Berkeley]: Alignment Collaborator Speed-Meeting
AlexMennen
1d
2
54
High-level hopes for AI alignment
HoldenKarnofsky
5d
3
207
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
19d
30
73
Revisiting algorithmic progress
Tamay
7d
6
51
«Boundaries», Part 3b: Alignment problems in terms of boundaries
Andrew_Critch
6d
2
128
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
59
Okay, I feel it now
g1
7d
14
7
Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic
Akash
2h
0
39
Proper scoring rules don’t guarantee predicting fixed points
Johannes_Treutlein
4d
2
96
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
28
Where to be an AI Safety Professor
scasper
13d
12
109
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
5
What about non-degree seeking?
Lao Mein
3d
5
26
Causal scrubbing: results on a paren balance checker
LawrenceC
17d
0
45
Career Scouting: Dentistry
koratkar
1mo
5
32
Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas
Akash
25d
2
67
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
16
Is the "Valley of Confused Abstractions" real?
jacquesthibs
15d
9
127
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
14
Causal scrubbing: Appendix
LawrenceC
17d
0
28
The Ground Truth Problem (Or, Why Evaluating Interpretability Methods Is Hard)
Jessica Mary
1mo
2