Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

1855 posts AI SERI MATS AI Sentience Distributional Shifts AI Robustness Truthful AI Adversarial Examples

185 posts Careers Audio Interviews Infra-Bayesianism Organization Updates AXRP Formal Proof Redwood Research Domain Theory Adversarial Training

531 (My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen

3mo

83

436 How To Get Into Independent Research On Alignment/Agency

johnswentworth

1y

33

404 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra

5mo

89

373 We Choose To Align AI

johnswentworth

11mo

15

331 What should you change in response to an "emergency"? And AI risk

AnnaSalamon

5mo

60

323 A challenge for AGI organizations, and a challenge for readers

Rob Bensinger

19d

30

287 Don't die with dignity; instead play to your outs

Jeffrey Ladish

8mo

58

282 AGI Safety FAQ / all-dumb-questions-allowed thread

Aryeh Englander

6mo

514

281 Ngo and Yudkowsky on alignment difficulty

Eliezer Yudkowsky

1y

143

281 An overview of 11 proposals for building safe advanced AI

evhub

2y

36

279 The Plan

johnswentworth

1y

77

276 DeepMind: Generally capable agents emerge from open-ended play

Daniel Kokotajlo

1y

53

271 larger language models may disappoint you [or, an eternally unfinished draft]

nostalgebraist

1y

29

265 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

184 High-stakes alignment via adversarial training [Redwood Research report]

dmz

7mo

29

165 Curated conversations with brilliant rationalists

spencerg

1y

18

164 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

159 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

157 Understanding Infra-Bayesianism: A Beginner-Friendly Video Series

Jack Parker

2mo

6

156 Announcing the LessWrong Curated Podcast

Ben Pace

6mo

17

151 An Update on Academia vs. Industry (one year into my faculty job)

David Scott Krueger (formerly: capybaralet)

3mo

18

143 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

138 Taking the parameters which seem to matter and rotating them until they don't

Garrett Baker

3mo

48

134 Externalized reasoning oversight: a research direction for language model alignment

tamera

4mo

22

124 Job Offering: Help Communicate Infrabayesianism

abramdemski

9mo

21

121 Redwood Research’s current project

Buck

1y

29

114 Evaluations project @ ARC is hiring a researcher and a webdev/engineer

Beth Barnes

3mo

7

108 I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead

lsusr

1y

33