Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

1855 posts AI SERI MATS AI Sentience Distributional Shifts AI Robustness Truthful AI Adversarial Examples

185 posts Careers Audio Interviews Infra-Bayesianism Organization Updates AXRP Formal Proof Redwood Research Domain Theory Adversarial Training

40 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

108 The next decades might be wild

Marius Hobbhahn

5d

21

0 I believe some AI doomers are overconfident

FTPickle

6h

4

33 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

22 Existential AI Safety is NOT separate from near-term applications

scasper

7d

15

13 Will Machines Ever Rule the World? MLAISU W50

Esben Kran

4d

4

95 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

160 AGI Safety FAQ / all-dumb-questions-allowed thread

Aryeh Englander

6mo

514

-3 Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend)

Remmelt

1d

6

128 Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong

14d

77

29 If Wentworth is right about natural abstractions, it would be bad for alignment

Wuschel Schulz

12d

5

73 Revisiting algorithmic progress

Tamay

7d

6

44 Predicting GPU performance

Marius Hobbhahn

6d

24

31 Is the AI timeline too short to have children?

Yoreth

6d

20

5 What about non-degree seeking?

Lao Mein

3d

5

6 [ASoT] Reflectivity in Narrow AI

Ulisse Mini

29d

1

28 Where to be an AI Safety Professor

scasper

13d

12

39 Proper scoring rules don’t guarantee predicting fixed points

Johannes_Treutlein

4d

2

16 Is the "Valley of Confused Abstractions" real?

jacquesthibs

15d

9

96 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

17 Latent Adversarial Training

Adam Jermyn

5mo

9

2 Vanessa Kosoy's PreDCA, distilled

Martín Soto

1mo

17

93 Infra-Bayesian physicalism: a formal theory of naturalized induction

Vanessa Kosoy

1y

20

96 Taking the parameters which seem to matter and rotating them until they don't

Garrett Baker

3mo

48

109 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

4 Guardian AI (Misaligned systems are all around us.)

Jessica Mary

25d

6

54 Neural Tangent Kernel Distillation

Thomas Larsen

2mo

20

20 Why I'm Working On Model Agnostic Interpretability

Jessica Mary

1mo

9