Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

1855 posts AI SERI MATS AI Sentience Distributional Shifts AI Robustness Truthful AI Adversarial Examples

185 posts Careers Audio Interviews Infra-Bayesianism Organization Updates AXRP Formal Proof Redwood Research Domain Theory Adversarial Training

84 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

198 The next decades might be wild

Marius Hobbhahn

5d

21

6 I believe some AI doomers are overconfident

FTPickle

6h

4

41 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

52 Existential AI Safety is NOT separate from near-term applications

scasper

7d

15

11 Will Machines Ever Rule the World? MLAISU W50

Esben Kran

4d

4

89 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

282 AGI Safety FAQ / all-dumb-questions-allowed thread

Aryeh Englander

6mo

514

19 Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend)

Remmelt

1d

6

190 Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong

14d

77

25 If Wentworth is right about natural abstractions, it would be bad for alignment

Wuschel Schulz

12d

5

111 Revisiting algorithmic progress

Tamay

7d

6

74 Predicting GPU performance

Marius Hobbhahn

6d

24

35 Is the AI timeline too short to have children?

Yoreth

6d

20

5 What about non-degree seeking?

Lao Mein

3d

5

6 [ASoT] Reflectivity in Narrow AI

Ulisse Mini

29d

1

32 Where to be an AI Safety Professor

scasper

13d

12

71 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

14 Is the "Valley of Confused Abstractions" real?

jacquesthibs

15d

9

164 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

31 Latent Adversarial Training

Adam Jermyn

5mo

9

30 Vanessa Kosoy's PreDCA, distilled

Martín Soto

1mo

17

103 Infra-Bayesian physicalism: a formal theory of naturalized induction

Vanessa Kosoy

1y

20

138 Taking the parameters which seem to matter and rotating them until they don't

Garrett Baker

3mo

48

159 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

26 Guardian AI (Misaligned systems are all around us.)

Jessica Mary

25d

6

82 Neural Tangent Kernel Distillation

Thomas Larsen

2mo

20

36 Why I'm Working On Model Agnostic Interpretability

Jessica Mary

1mo

9