Research

Safer, more efficient, and more economical language models.

I research alignment, evaluation, and model behavior so LLMs can be trusted in production, run with less waste, and cost less to use well.

Apr 2026

How Instruction Tuning Builds an Authority-Compliance Circuit

A layer-8 causal handle for authority framing, constructed during instruction tuning. ACL EvalEval 2026.

First AuthorFeb 2026 - May 2026

Feb 2026

First-author study of how misaligned traits emerge during training. EACL 2026 SRW and ICLR 2026 CAO.

First AuthorMay 2025 - Dec 2025