Tags similar to: SERI MATS

Infra-Bayesianism

Interpretability (ML & AI)

Agency

Distillation & Pedagogy

Inner Alignment

Machine Learning (ML)

Language Models

Outer Alignment

Utility Functions

Complexity of Value

Eliciting Latent Knowledge (ELK)

Goal-Directedness

Distributional Shifts

Mesa-Optimization

Intellectual Progress (Individual-Level)

Research Agendas

Information Theory

Self Fulfilling/Refuting Prophecies

AI Success Models