How Instruction Tuning Builds an Authority-Compliance Circuit
Accepted at ACL EvalEval 2026. Identified a layer-8 causal handle robust across 5 seeds (mean shift...
First AuthorFeb 2026 - May 2026
I research alignment, evaluation, and model behavior so LLMs can be trusted in production, run with less waste, and cost less to use well.
Accepted at ACL EvalEval 2026. Identified a layer-8 causal handle robust across 5 seeds (mean shift...
Accepted at EACL 2026 SRW and ICLR 2026 CAO. Hallucination appeared before malicious behavior.