Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

808 posts AI Embedded Agency Eliciting Latent Knowledge (ELK) Reinforcement Learning Infra-Bayesianism Counterfactuals Logic & Mathematics AI Capabilities Interviews Audio Subagents Wireheading

126 posts Value Learning Inverse Reinforcement Learning Machine Intelligence Research Institute (MIRI) Agent Foundations Meta-Philosophy Metaethics Community Philosophy The Pointers Problem Moral Uncertainty Cognitive Reduction Center for Human-Compatible AI (CHAI)

25 Existential AI Safety is NOT separate from near-term applications

scasper

7d

15

45 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

233 Reward is not the optimization target

TurnTrout

4mo

97

35 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

99 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

136 Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong

14d

77

82 A shot at the diamond-alignment problem

TurnTrout

2mo

53

113 Mechanistic anomaly detection and ELK

paulfchristiano

25d

17

42 Four usages of "loss" in AI

TurnTrout

2mo

18

106 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

213 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

39 In defense of probably wrong mechanistic models

evhub

14d

10

64 Verification Is Not Easier Than Generation In General

johnswentworth

14d

23

106 Finding gliders in the game of life

paulfchristiano

19d

7

53 Don't design agents which exploit adversarial inputs

TurnTrout

1mo

61

51 Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility

Akash

28d

20

39 People care about each other even though they have imperfect motivational pointers?

TurnTrout

1mo

25

20 Stable Pointers to Value: An Agent Embedded in Its Own Utility Function

abramdemski

5y

9

45 Clarifying the Agent-Like Structure Problem

johnswentworth

2mo

14

15 What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment

xuan

3mo

15

40 Beyond Kolmogorov and Shannon

Alexander Gietelink Oldenziel

1mo

14

23 Bridging Expected Utility Maximization and Optimization

Whispermute

4mo

5

45 Prize and fast track to alignment research at ALTER

Vanessa Kosoy

3mo

4

12 The Slippery Slope from DALLE-2 to Deepfake Anarchy

scasper

1mo

9

99 The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

2y

43

39 Announcing the Introduction to ML Safety course

Dan H

4mo

6

70 Encultured AI Pre-planning, Part 1: Enabling New Benchmarks

Andrew_Critch

4mo

2

54 Humans can be assigned any values whatsoever…

Stuart_Armstrong

4y

26