Tree of Tags

Go Back

You can't go any further

Choose this branch

meritocratic regular democratic

hot top alive

0 posts Truthful AI

593 posts AI Autonomy and Choice

25 Existential AI Safety is NOT separate from near-term applications

scasper

7d

15

45 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

35 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

99 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

136 Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong

14d

77

82 A shot at the diamond-alignment problem

TurnTrout

2mo

53

213 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

39 In defense of probably wrong mechanistic models

evhub

14d

10

64 Verification Is Not Easier Than Generation In General

johnswentworth

14d

23

32 Concept extrapolation for hypothesis generation

Stuart_Armstrong

8d

2

65 Update to Mysteries of mode collapse: text-davinci-002 not RLHF

janus

1mo

8

85 How could we know that an AGI system will have good consequences?

So8res

1mo

24

85 Automating Auditing: An ambitious concrete technical research proposal

evhub

1y

9

77 Response to Katja Grace's AI x-risk counterarguments

Erik Jenner

2mo

18