Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
74 posts
Infra-Bayesianism
Counterfactuals
Logic & Mathematics
Formal Proof
Domain Theory
Functional Decision Theory
Counterfactual Mugging
Newcomb's Problem
Futarchy
Ontological Crisis
Meta-Honesty
Intelligence Explosion
40 posts
Interviews
Audio
Redwood Research
AXRP
Transcripts
Adversarial Examples
Adversarial Training
AI Robustness
Autonomous Weapons
98
Infra-Bayesian physicalism: a formal theory of naturalized induction
Vanessa Kosoy
1y
20
12
Counterfactuals are Confusing because of an Ontological Shift
Chris_Leong
4mo
35
44
Probability as Minimal Map
johnswentworth
3y
10
39
Hessian and Basin volume
Vivek Hebbar
5mo
9
121
Introduction To The Infra-Bayesianism Sequence
Diffractor
2y
64
41
Infra-Exercises, Part 1
Diffractor
3mo
9
21
A Brief Intro to Domain Theory
Diffractor
3y
4
6
Third-person counterfactuals
Benya_Fallenstein
7y
0
9
The odd counterfactuals of playing chicken
Benya_Fallenstein
7y
0
1
Un-manipulable counterfactuals
Stuart_Armstrong
7y
0
0
Orthogonality: action counterfactuals
Stuart_Armstrong
7y
0
19
Optimal and Causal Counterfactual Worlds
Scott Garrabrant
7y
0
8
Provability Counterfactuals vs Three Axioms of Galles and Pearl
IAFF-User-52
7y
0
4
Logical Counterfactuals Consistent Under Self-Modification
abramdemski
7y
0
106
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
118
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
18
Latent Adversarial Training
Adam Jermyn
5mo
9
134
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
73
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
98
High-stakes alignment via adversarial training [Redwood Research report]
dmz
7mo
29
54
Redwood's Technique-Focused Epistemic Strategy
adamShimi
1y
1
18
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
4mo
0
16
AXRP Episode 1 - Adversarial Policies with Adam Gleave
DanielFilan
1y
5
27
[AN #146]: Plausible stories of how we might fail to avert an existential catastrophe
Rohin Shah
1y
1
18
AXRP Episode 2 - Learning Human Biases with Rohin Shah
DanielFilan
1y
0
46
AXRP Episode 7 - Side Effects with Victoria Krakovna
DanielFilan
1y
6
9
AXRP Episode 18 - Concept Extrapolation with Stuart Armstrong
DanielFilan
3mo
1
34
AXRP Episode 7.5 - Forecasting Transformative AI from Biological Anchors with Ajeya Cotra
DanielFilan
1y
1