AI Interpretability & Evaluation

When we use language models to measure politics — or ask what politics they already contain — what are we actually measuring? This is hands-on interpretability and evaluation work. I study how LLMs represent political ideology internally, using sparse autoencoders and representation analysis, and I show where LLM-based measurement fails construct validity: models key on surface features rather than the construct they are meant to capture. The work is grounded in mechanistic interpretability, including graduate training in Neural Mechanics with David Bau.

Stance Is Not a Construct Working Paper — validity gaps in LLM annotation of political attitudes
The Political Geometry of Ideology in LLMs Working Paper — how models encode ideology in representation space

Y. Emre Tapan

Navigate

AI Interpretability & Evaluation

Research Map