Mapping the political landscape from data traces: multidimensional opinions of users, politicians and media outlets on X

Mapping the political landscape from data traces: multidimensional opinions of users, politicians and media outlets on X
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Studying political activity on social media often requires defining and measuring political stances of users or content. Relevant examples include the study of opinion polarization, or the study of political diversity in online content diets. While many research designs rely on operationalizations best suited for the US setting, few allow addressing more general political systems, in which users and media outlets might exhibit stances on multiple ideology and issue dimensions, going beyond traditional Liberal-Conservative or Left-Right scales. To advance the study of more general online ecosystems, we present a dataset pertaining to a population of X/Twitter users, parliamentarians, and media outlets embedded in a political space spanned by dimensions measuring attitudes towards immigration, the EU, liberal values, elites and institutions, nationalism and the environment, in addition to left-right and liberal-conservative scales. We include indicators of individual activity and popularity: mean number of posts per day, number of followers, and number of followees. We provide several benchmarks validating the positions of these entities and discuss several applications for this dataset.


💡 Research Summary

**
The paper presents a large‑scale, publicly released dataset that captures multidimensional political opinions of French‑language X (formerly Twitter) users, members of parliament (MPs), and media outlets. Recognizing the limitations of the dominant one‑dimensional (left‑right or liberal‑conservative) scaling approaches—largely rooted in US‑centric research—the authors construct a political space spanned by sixteen ideology and issue dimensions: immigration, European Union integration, liberal values, anti‑elite sentiment, nationalism, environmental policy, and the traditional left‑right and GAL‑TAN (liberal‑traditional) axes, among others.

Data collection was performed in February 2023, after the X API was locked behind a paywall, using the open‑source crawler “minet”. All 886 French MPs with X accounts were identified (883 remained after filtering) and their followers were harvested, yielding a bipartite follower network of 978 206 regular users. To ensure political sophistication, users were required to follow at least three MPs and to have at least 25 followers, a filter that removes bots and inactive accounts while preserving a substantial “knowledgeable” user base. The resulting directed bipartite graph contains 9.6 million edges, with MPs averaging an in‑degree of 10 910 and users an out‑degree of about 10.

The core inference method follows Ramaciotti et al. (2022) and Barbera (2015). It treats the observed follow links as outcomes of a probabilistic homophily law: the probability that user i follows MP j is a logistic function of a latent ideological distance between them, plus individual propensity parameters (α for following, β for being followed) and a shape parameter γ that balances homophily against idiosyncratic effects. Rather than running a computationally intensive MCMC, the authors approximate the latent positions (ϕ) via Correspondence Analysis (CA) on the adjacency matrix, a proven fast alternative for large categorical networks.

The latent space is high‑dimensional; its dimensionality is bounded by the number of parties that can be identified in both the network and the external survey data. Eleven French parties are represented, allowing the authors to fit affine transformations using the first ten (for CHES 2019) or eight (for CHES 2023) latent dimensions. Party‑level latent coordinates are obtained by averaging the positions of MPs belonging to each party. These party vectors are then regressed onto the continuous scores of the same parties in the Chapel Hill Expert Survey (CHES), which provides 0‑10 scales for a rich set of ideological and issue dimensions (51 in 2019, 11 in 2023). A ridge regression with penalty α = 1.0 yields an affine mapping that aligns the latent coordinates with the survey‑based axes, effectively solving the identification problem (rotation, translation) inherent in latent space models.

Applying the learned transformation to every user and MP produces a set of continuous scores on the selected CHES dimensions: left‑right, GAL‑TAN, EU integration, anti‑elite sentiment, immigration, nationalism, environmental policy, etc. In addition to these political coordinates, the dataset includes three activity/popularity metrics for each user (average daily posts, follower count, followee count) and analogous popularity measures for media domains (share counts). Media positions are derived by averaging the political scores of users who posted URLs from each domain; the resulting domain coordinates align well with prior classifications of French media on the left‑right spectrum.

Validation is performed on two fronts. First, the authors manually annotated a sample of user profile bios for political stance, using both human coders and a generative‑AI pipeline (GPT‑4). Correlations between these external labels and the inferred scores exceed 0.7, confirming that the follow‑based inference captures self‑reported ideology. Second, the media domain positions are compared against existing literature on French press bias; the alignment is strong, especially on the traditional left‑right axis and on issue‑specific dimensions such as immigration.

The final released package comprises: (i) a CSV/JSON file with 978 000 anonymized user records, each containing a unique ID, the sixteen political scores, and activity metrics; (ii) a similar file for 883 MPs; (iii) a file for ~400 media domains with their political coordinates and share statistics; (iv) the full codebase (Python, R) for data collection, CA, ridge mapping, and validation, hosted on GitHub under an open‑source license.

Limitations are openly discussed. The reliance on follow relationships assumes that following behavior reflects political alignment, which may not hold for strategic follows or curiosity‑driven follows. The methodology is calibrated to French party structures and CHES surveys; transferring it to other countries would require comparable expert surveys and a mapping of local parties. Finally, the dataset reflects a snapshot from early 2023; future changes in the X platform, API policies, or political events could affect the stability of the inferred positions.

Overall, this work delivers the first publicly available, high‑granularity dataset that simultaneously provides continuous multidimensional political positions and rich activity/popularity metadata for a near‑million‑scale population of users, legislators, and media outlets. It opens new avenues for research on political polarization, media bias, echo‑chamber formation, and the interplay between online activity and ideological stance across multiple issue dimensions.


Comments & Academic Discussion

Loading comments...

Leave a Comment