Go Back
Choose this branch
Choose this branch
meritocratic
regular
democratic
hot
top
alive
74 posts
Infra-Bayesianism
Counterfactuals
Logic & Mathematics
Formal Proof
Domain Theory
Functional Decision Theory
Counterfactual Mugging
Newcomb's Problem
Futarchy
Ontological Crisis
Meta-Honesty
Intelligence Explosion
40 posts
Interviews
Audio
Redwood Research
AXRP
Transcripts
Adversarial Examples
Adversarial Training
AI Robustness
Autonomous Weapons
121
Introduction To The Infra-Bayesianism Sequence
Diffractor
2y
64
98
Infra-Bayesian physicalism: a formal theory of naturalized induction
Vanessa Kosoy
1y
20
95
Counterfactual Mugging Poker Game
Scott Garrabrant
4y
2
93
Zoom In: An Introduction to Circuits
evhub
2y
11
69
Recent Progress in the Theory of Neural Networks
interstice
3y
9
60
MIRI/OP exchange about decision theory
Rob Bensinger
1y
7
49
Complete Class: Consequentialist Foundations
abramdemski
4y
34
44
Probability as Minimal Map
johnswentworth
3y
10
41
Infra-Exercises, Part 1
Diffractor
3mo
9
41
The Promise and Peril of Finite Sets
davidad
1y
4
39
Self-confirming predictions can be arbitrarily bad
Stuart_Armstrong
3y
11
39
Hessian and Basin volume
Vivek Hebbar
5mo
9
38
Counterfactuals, thick and thin
Nisan
4y
11
36
AXRP Episode 5 - Infra-Bayesianism with Vanessa Kosoy
DanielFilan
1y
12
170
Redwood Research’s current project
Buck
1y
29
134
Takeaways from our robust injury classifier project [Redwood Research]
dmz
3mo
9
129
Why I'm excited about Redwood Research's current project
paulfchristiano
1y
6
118
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
1mo
14
106
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC
17d
9
98
High-stakes alignment via adversarial training [Redwood Research report]
dmz
7mo
29
75
AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant
DanielFilan
1y
2
73
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
1mo
5
54
Redwood's Technique-Focused Epistemic Strategy
adamShimi
1y
1
51
A conversation about Katja's counterarguments to AI risk
Matthew Barnett
2mo
9
50
AXRP Episode 4 - Risks from Learned Optimization with Evan Hubinger
DanielFilan
1y
10
47
AXRP Episode 10 - AI’s Future and Impacts with Katja Grace
DanielFilan
1y
2
46
AXRP Episode 7 - Side Effects with Victoria Krakovna
DanielFilan
1y
6
46
Rohin Shah on reasons for AI optimism
abergal
3y
58