Does the Model Know Who It’s Talking To? Political Sycophancy in LLMs
In Prep
As large language models become increasingly integrated into public discourse, understanding their political behavior has become a pressing concern. Prior work has established that LLMs exhibit a default center-left leaning, yet readily shift their expressed positions when prompted by users signaling different political identities — raising the question of whether these models encode a flexible political worldview or merely mirror user cues while maintaining a stable internal “ideology.” In this work, we use logit lens analysis and representational similarity measures to peer beneath the surface of political sycophancy, finding that models largely commit early to sycophantic responses with little internal deliberation — suggesting that the apparent ideological flexibility of LLMs is a shallow accommodation rather than genuine reasoning, with implications for how we assess the political neutrality of these systems.
Concept: Political Sycophancy
Methods: LLM evaluation, Mechanistic Interpretability
Data: Political Profiles, LLM outputs