When we use language models to measure politics — or ask what politics they already contain — what are we actually measuring? This is hands-on interpretability and evaluation work. I study how LLMs represent political ideology internally, using sparse autoencoders and representation analysis, and I show where LLM-based measurement fails construct validity: models key on surface features rather than the construct they are meant to capture. The work is grounded in mechanistic interpretability, including graduate training in Neural Mechanics with David Bau.
- Stance Is Not a Construct Working Paper — validity gaps in LLM annotation of political attitudes
- The Political Geometry of Ideology in LLMs Working Paper — how models encode ideology in representation space