Research

Safer, efficient, and economical language models.

I research alignment, evaluation, and model behavior so LLMs can be trusted in production, run with less waste, and cost less to use well.

Apr 2026

How Instruction Tuning Builds an Authority-Compliance Circuit

Accepted at ACL EvalEval 2026. Identified a layer-8 causal handle robust across 5 seeds (mean shift...

First AuthorFeb 2026 - May 2026
Feb 2026

Emergent Misalignment in LLMs

Accepted at EACL 2026 SRW and ICLR 2026 CAO. Hallucination appeared before malicious behavior.

First AuthorMay 2025 - Dec 2025