Tree of Tags

Go Back

You can't go any further

Choose this branch

meritocratic regular democratic

hot top alive

0 posts Truthful AI

593 posts AI Autonomy and Choice

62 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

37 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

21 Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.

Charlie Steiner

19h

0

232 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

92 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

159 Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong

14d

77

42 High-level hopes for AI alignment

HoldenKarnofsky

5d

3

37 Existential AI Safety is NOT separate from near-term applications

scasper

7d

15

30 Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.

Charlie Steiner

7d

3

56 Verification Is Not Easier Than Generation In General

johnswentworth

14d

23

68 Why Would AI "Aim" To Defeat Humanity?

HoldenKarnofsky

21d

9

41 In defense of probably wrong mechanistic models

evhub

14d

10

65 Distinguishing test from training

So8res

21d

10

20 Concept extrapolation for hypothesis generation

Stuart_Armstrong

8d

2