Tags similar to: SERI MATS

Distillation & Pedagogy

Language Models

Outer Alignment

Utility Functions

Complexity of Value

Inner Alignment

Interpretability (ML & AI)

Goal-Directedness

Research Agendas

Eliciting Latent Knowledge (ELK)

Self Fulfilling/Refuting Prophecies

AI Success Models

Machine Learning (ML)

AI Boxing (Containment)

Reinforcement Learning

Gradient Hacking

Myopia