Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

934 posts AI Value Learning Embedded Agency Community Eliciting Latent Knowledge (ELK) Reinforcement Learning Infra-Bayesianism Counterfactuals Logic & Mathematics Interviews AI Capabilities Inverse Reinforcement Learning

80 posts AI Timelines AI Takeoff AI Persuasion History Forecasting & Prediction Dialogue (format) Technological Forecasting Forecasts (Specific Predictions) Industrial Revolution Effective Altruism Transformative AI Progress Studies

49 Existential AI Safety is NOT separate from near-term applications

scasper

7d

15

79 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

271 Reward is not the optimization target

TurnTrout

4mo

97

39 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

85 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

182 Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong

14d

77

72 A shot at the diamond-alignment problem

TurnTrout

2mo

53

67 Don't design agents which exploit adversarial inputs

TurnTrout

1mo

61

129 Mechanistic anomaly detection and ELK

paulfchristiano

25d

17

42 Four usages of "loss" in AI

TurnTrout

2mo

18

87 Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility

Akash

28d

20

154 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

251 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

43 In defense of probably wrong mechanistic models

evhub

14d

10

104 Applying superintelligence without collusion

Eric Drexler

1mo

56

197 Yudkowsky and Christiano discuss "Takeoff Speeds"

Eliezer Yudkowsky

1y

181

182 What does it take to defend the world against out-of-control AGIs?

Steven Byrnes

1mo

31

13 How promising are legal avenues to restrict AI training data?

thehalliard

10d

2

315 Two-year update on my personal AI timelines

Ajeya Cotra

4mo

60

486 What 2026 looks like

Daniel Kokotajlo

1y

98

110 How might we align transformative AI if it’s developed very soon?

HoldenKarnofsky

3mo

17

198 Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain

Daniel Kokotajlo

1y

85

84 Disagreement with bio anchors that lead to shorter timelines

Marius Hobbhahn

1mo

16

37 AGI Timelines Are Mostly Not Strategically Relevant To Alignment

johnswentworth

3mo

35

118 Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon

johnswentworth

8mo

130

84 AI strategy nearcasting

HoldenKarnofsky

3mo

3

62 A review of the Bio-Anchors report

jylin04

2mo

4

6 Law-Following AI 4: Don't Rely on Vicarious Liability

Cullen_OKeefe

4mo

2