Tree of Tags

Go Back

You can't go any further

Choose this branch

meritocratic regular democratic

hot top alive

0 posts Truthful AI

593 posts AI Autonomy and Choice

79 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

39 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

251 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

12 Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.

Charlie Steiner

19h

0

85 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

182 Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong

14d

77

49 Existential AI Safety is NOT separate from near-term applications

scasper

7d

15

29 High-level hopes for AI alignment

HoldenKarnofsky

5d

3

23 Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.

Charlie Steiner

7d

3

503 (My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen

3mo

83

48 Verification Is Not Easier Than Generation In General

johnswentworth

14d

23

43 In defense of probably wrong mechanistic models

evhub

14d

10

65 Why Would AI "Aim" To Defeat Humanity?

HoldenKarnofsky

21d

9

63 Distinguishing test from training

So8res

21d

10