Go Back
Choose this branch
You can't go any further
meritocratic
regular
democratic
hot
top
alive
593 posts
AI
Social Media
Autonomy and Choice
Truthful AI
27 posts
Eliciting Latent Knowledge (ELK)
25
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
45
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
35
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
99
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
136
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
82
A shot at the diamond-alignment problem
TurnTrout
2mo
53
213
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
39
In defense of probably wrong mechanistic models
evhub
14d
10
64
Verification Is Not Easier Than Generation In General
johnswentworth
14d
23
32
Concept extrapolation for hypothesis generation
Stuart_Armstrong
8d
2
65
Update to Mysteries of mode collapse: text-davinci-002 not RLHF
janus
1mo
8
85
How could we know that an AGI system will have good consequences?
So8res
1mo
24
85
Automating Auditing: An ambitious concrete technical research proposal
evhub
1y
9
77
Response to Katja Grace's AI x-risk counterarguments
Erik Jenner
2mo
18
113
Mechanistic anomaly detection and ELK
paulfchristiano
25d
17
106
Finding gliders in the game of life
paulfchristiano
19d
7
128
ELK prize results
paulfchristiano
9mo
50
67
Where I currently disagree with Ryan Greenblatt’s version of the ELK approach
So8res
2mo
7
9
Bounded complexity of solving ELK and its implications
Rubi J. Hudson
5mo
4
46
Some Hacky ELK Ideas
johnswentworth
10mo
8
18
Towards a better circuit prior: Improving on ELK state-of-the-art
evhub
8mo
0
64
Counterexamples to some ELK proposals
paulfchristiano
11mo
10
23
Musings on the Speed Prior
evhub
9mo
4
46
Eliciting Latent Knowledge Via Hypothetical Sensors
John_Maxwell
11mo
2
45
What Does The Natural Abstraction Framework Say About ELK?
johnswentworth
10mo
0
34
For ELK truth is mostly a distraction
c.trout
1mo
0
115
Prizes for ELK proposals
paulfchristiano
11mo
156
25
ELK contest submission: route understanding through the human ontology
Vika
9mo
2