Tree of Tags

Go Back

You can't go any further

You can't go any further

meritocratic regular democratic

hot top alive

0 posts Autonomy and Choice

593 posts AI

45 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

30 Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.

Charlie Steiner

19h

0

35 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

213 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

99 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

55 High-level hopes for AI alignment

HoldenKarnofsky

5d

3

136 Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong

14d

77

37 Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.

Charlie Steiner

7d

3

64 Verification Is Not Easier Than Generation In General

johnswentworth

14d

23

32 Concept extrapolation for hypothesis generation

Stuart_Armstrong

8d

2

25 Existential AI Safety is NOT separate from near-term applications

scasper

7d

15

71 Why Would AI "Aim" To Defeat Humanity?

HoldenKarnofsky

21d

9

67 Distinguishing test from training

So8res

21d

10

39 In defense of probably wrong mechanistic models

evhub

14d

10