Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

1855 posts AI SERI MATS AI Sentience Distributional Shifts AI Robustness Truthful AI Adversarial Examples

185 posts Careers Audio Interviews Infra-Bayesianism Organization Updates AXRP Formal Proof Redwood Research Domain Theory Adversarial Training

62 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

37 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

21 Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.

Charlie Steiner

19h

0

153 The next decades might be wild

Marius Hobbhahn

5d

21

232 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

92 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

3 I believe some AI doomers are overconfident

FTPickle

6h

4

265 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

11 Solution to The Alignment Problem

Algon

1d

0

92 Revisiting algorithmic progress

Tamay

7d

6

18 Event [Berkeley]: Alignment Collaborator Speed-Meeting

AlexMennen

1d

2

83 Okay, I feel it now

g1

7d

14

159 Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong

14d

77

59 Predicting GPU performance

Marius Hobbhahn

6d

24

6 Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic

Akash

2h

0

55 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

2 Career Scouting: Housing Coordination

koratkar

5h

0

130 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

30 Where to be an AI Safety Professor

scasper

13d

12

134 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

67 Career Scouting: Dentistry

koratkar

1mo

5

5 What about non-degree seeking?

Lao Mein

3d

5

26 Causal scrubbing: results on a paren balance checker

LawrenceC

17d

0

37 Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas

Akash

25d

2

86 Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang

1mo

5

135 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

114 Understanding Infra-Bayesianism: A Beginner-Friendly Video Series

Jack Parker

2mo

6

15 Is the "Valley of Confused Abstractions" real?

jacquesthibs

15d

9