Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

1855 posts AI SERI MATS AI Sentience Distributional Shifts AI Robustness Truthful AI Adversarial Examples

185 posts Careers Audio Interviews Infra-Bayesianism Organization Updates AXRP Formal Proof Redwood Research Domain Theory Adversarial Training

29 Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.

Charlie Steiner

19h

0

33 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

40 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

199 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

108 The next decades might be wild

Marius Hobbhahn

5d

21

15 Solution to The Alignment Problem

Algon

1d

0

95 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

22 Event [Berkeley]: Alignment Collaborator Speed-Meeting

AlexMennen

1d

2

54 High-level hopes for AI alignment

HoldenKarnofsky

5d

3

207 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

73 Revisiting algorithmic progress

Tamay

7d

6

51 «Boundaries», Part 3b: Alignment problems in terms of boundaries

Andrew_Critch

6d

2

128 Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong

14d

77

59 Okay, I feel it now

g1

7d

14

7 Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic

Akash

2h

0

39 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

96 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

28 Where to be an AI Safety Professor

scasper

13d

12

109 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

5 What about non-degree seeking?

Lao Mein

3d

5

26 Causal scrubbing: results on a paren balance checker

LawrenceC

17d

0

45 Career Scouting: Dentistry

koratkar

1mo

5

32 Podcast: Shoshannah Tekofsky on skilling up in AI safety, visiting Berkeley, and developing novel research ideas

Akash

25d

2

67 Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang

1mo

5

16 Is the "Valley of Confused Abstractions" real?

jacquesthibs

15d

9

127 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

14 Causal scrubbing: Appendix

LawrenceC

17d

0

28 The Ground Truth Problem (Or, Why Evaluating Interpretability Methods Is Hard)

Jessica Mary

1mo

2