Go Back
Choose this branch
You can't go any further
meritocratic
regular
democratic
hot
top
alive
593 posts
AI
Social Media
Autonomy and Choice
Truthful AI
27 posts
Eliciting Latent Knowledge (ELK)
49
Existential AI Safety is NOT separate from near-term applications
scasper
7d
15
79
Towards Hodge-podge Alignment
Cleo Nardo
1d
20
39
The "Minimal Latents" Approach to Natural Abstractions
johnswentworth
22h
6
85
Trying to disambiguate different questions about whether RLHF is “good”
Buck
6d
39
182
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
14d
77
72
A shot at the diamond-alignment problem
TurnTrout
2mo
53
251
AI alignment is distinct from its near-term applications
paulfchristiano
7d
5
43
In defense of probably wrong mechanistic models
evhub
14d
10
48
Verification Is Not Easier Than Generation In General
johnswentworth
14d
23
8
Concept extrapolation for hypothesis generation
Stuart_Armstrong
8d
2
73
Update to Mysteries of mode collapse: text-davinci-002 not RLHF
janus
1mo
8
87
How could we know that an AGI system will have good consequences?
So8res
1mo
24
69
Automating Auditing: An ambitious concrete technical research proposal
evhub
1y
9
73
Response to Katja Grace's AI x-risk counterarguments
Erik Jenner
2mo
18
129
Mechanistic anomaly detection and ELK
paulfchristiano
25d
17
76
Finding gliders in the game of life
paulfchristiano
19d
7
132
ELK prize results
paulfchristiano
9mo
50
59
Where I currently disagree with Ryan Greenblatt’s version of the ELK approach
So8res
2mo
7
11
Bounded complexity of solving ELK and its implications
Rubi J. Hudson
5mo
4
22
Some Hacky ELK Ideas
johnswentworth
10mo
8
20
Towards a better circuit prior: Improving on ELK state-of-the-art
evhub
8mo
0
36
Counterexamples to some ELK proposals
paulfchristiano
11mo
10
15
Musings on the Speed Prior
evhub
9mo
4
30
Eliciting Latent Knowledge Via Hypothetical Sensors
John_Maxwell
11mo
2
23
What Does The Natural Abstraction Framework Say About ELK?
johnswentworth
10mo
0
30
For ELK truth is mostly a distraction
c.trout
1mo
0
167
Prizes for ELK proposals
paulfchristiano
11mo
156
17
ELK contest submission: route understanding through the human ontology
Vika
9mo
2