Tree of Tags

Go Back

Choose this branch

Choose this branch

meritocratic regular democratic

hot top alive

620 posts AI Eliciting Latent Knowledge (ELK) AI Robustness Truthful AI Autonomy and Choice Intelligence Explosion Social Media Transcripts

114 posts Infra-Bayesianism Counterfactuals Interviews Audio Logic & Mathematics AXRP Redwood Research Domain Theory Counterfactual Mugging Newcomb's Problem Formal Proof Functional Decision Theory

25 Existential AI Safety is NOT separate from near-term applications

scasper

7d

15

45 Towards Hodge-podge Alignment

Cleo Nardo

1d

20

35 The "Minimal Latents" Approach to Natural Abstractions

johnswentworth

22h

6

99 Trying to disambiguate different questions about whether RLHF is “good”

Buck

6d

39

136 Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong

14d

77

82 A shot at the diamond-alignment problem

TurnTrout

2mo

53

113 Mechanistic anomaly detection and ELK

paulfchristiano

25d

17

213 AI alignment is distinct from its near-term applications

paulfchristiano

7d

5

39 In defense of probably wrong mechanistic models

evhub

14d

10

64 Verification Is Not Easier Than Generation In General

johnswentworth

14d

23

106 Finding gliders in the game of life

paulfchristiano

19d

7

32 Concept extrapolation for hypothesis generation

Stuart_Armstrong

8d

2

65 Update to Mysteries of mode collapse: text-davinci-002 not RLHF

janus

1mo

8

85 How could we know that an AGI system will have good consequences?

So8res

1mo

24

106 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC

17d

9

98 Infra-Bayesian physicalism: a formal theory of naturalized induction

Vanessa Kosoy

1y

20

118 Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau

1mo

14

18 Latent Adversarial Training

Adam Jermyn

5mo

9

134 Takeaways from our robust injury classifier project [Redwood Research]

dmz

3mo

9

12 Counterfactuals are Confusing because of an Ontological Shift

Chris_Leong

4mo

35

44 Probability as Minimal Map

johnswentworth

3y

10

73 Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang

1mo

5

39 Hessian and Basin volume

Vivek Hebbar

5mo

9

98 High-stakes alignment via adversarial training [Redwood Research report]

dmz

7mo

29

121 Introduction To The Infra-Bayesianism Sequence

Diffractor

2y

64

41 Infra-Exercises, Part 1

Diffractor

3mo

9

54 Redwood's Technique-Focused Epistemic Strategy

adamShimi

1y

1

18 AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

DanielFilan

4mo

0