Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

620 posts AI Eliciting Latent Knowledge (ELK) AI Robustness Truthful AI Autonomy and Choice Intelligence Explosion Social Media Transcripts

114 posts Infra-Bayesianism Counterfactuals Interviews Audio Logic & Mathematics AXRP Redwood Research Domain Theory Counterfactual Mugging Newcomb's Problem Formal Proof Functional Decision Theory

503 (My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen

3mo

83

409 Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger

1y

257

265 The Plan

johnswentworth

1y

77

265 Ngo and Yudkowsky on alignment difficulty

Eliezer Yudkowsky

1y

143

265 An overview of 11 proposals for building safe advanced AI

evhub

2y

36

263 DeepMind: Generally capable agents emerge from open-ended play

Daniel Kokotajlo

1y

53

257 larger language models may disappoint you [or, an eternally unfinished draft]

nostalgebraist

1y

29

253 Embedded Agents

abramdemski

4y

41

251 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

248 Visible Thoughts Project and Bounty Announcement

So8res

1y

104

247 Optimality is the tiger, and agents are its teeth

Veedrac

8mo

31

213 Safetywashing

Adam Scholl

5mo

17

206 Attempted Gears Analysis of AGI Intervention Discussion With Eliezer

Zvi

1y

48

205 ARC's first technical report: Eliciting Latent Knowledge

paulfchristiano

1y

88

174 High-stakes alignment via adversarial training [Redwood Research report]

dmz

7mo

29

154 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

150 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

136 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

116 Redwood Research’s current project

Buck

1y

29

99 Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang

1mo

5

98 Infra-Bayesian physicalism: a formal theory of naturalized induction

Vanessa Kosoy

1y

20

95 Why I'm excited about Redwood Research's current project

paulfchristiano

1y

6

87 Introduction To The Infra-Bayesianism Sequence

Diffractor

2y

64

83 Recent Progress in the Theory of Neural Networks

interstice

3y

9

79 Counterfactual Mugging Poker Game

Scott Garrabrant

4y

2

75 Zoom In: An Introduction to Circuits

evhub

2y

11

57 Infra-Exercises, Part 1

Diffractor

3mo

9

54 Probability as Minimal Map

johnswentworth

3y

10