Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

734 posts AI Eliciting Latent Knowledge (ELK) Infra-Bayesianism Counterfactuals Logic & Mathematics Interviews Audio AXRP Redwood Research Transcripts Formal Proof Domain Theory

74 posts Embedded Agency Reinforcement Learning Subagents Reward Functions EfficientZero Robust Agents Wireheading AI Capabilities Spurious Counterfactuals Category Theory Memetics Tradeoffs

242 Visible Thoughts Project and Bounty Announcement

So8res

1y

104

241 Discussion with Eliezer Yudkowsky on AGI interventions

Rob Bensinger

1y

257

231 DeepMind: Generally capable agents emerge from open-ended play

Daniel Kokotajlo

1y

53

219 ARC's first technical report: Eliciting Latent Knowledge

paulfchristiano

1y

88

217 larger language models may disappoint you [or, an eternally unfinished draft]

nostalgebraist

1y

29

213 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

213 Hiring engineers and researchers to help align GPT-3

paulfchristiano

2y

14

211 Safetywashing

Adam Scholl

5mo

17

205 The Plan

johnswentworth

1y

77

205 Ngo and Yudkowsky on alignment difficulty

Eliezer Yudkowsky

1y

143

202 Attempted Gears Analysis of AGI Intervention Discussion With Eliezer

Zvi

1y

48

194 Announcing the Alignment Research Center

paulfchristiano

1y

6

185 (My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen

3mo

83

177 A note about differential technological development

So8res

5mo

31

259 Humans are very reliable agents

alyssavance

6mo

35

233 Reward is not the optimization target

TurnTrout

4mo

97

212 EfficientZero: How It Works

1a3orn

1y

42

155 Introduction to Cartesian Frames

Scott Garrabrant

2y

29

144 EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

gwern

1y

52

134 Why Subagents?

johnswentworth

3y

42

109 Robust Delegation

abramdemski

4y

10

108 We have achieved Noob Gains in AI

phdead

7mo

21

103 Reward Is Not Enough

Steven Byrnes

1y

18

98 The alignment problem in different capability regimes

Buck

1y

12

95 Embedded Agency (full-text version)

Scott Garrabrant

4y

15

93 Updates and additions to "Embedded Agency"

Rob Bensinger

2y

1

93 Subsystem Alignment

abramdemski

4y

12

93 Humans Are Embedded Agents Too

johnswentworth

2y

19