The VQR, Italys second national research assessment: Methodological failures and ranking distortions

The VQR, Italys second national research assessment: Methodological   failures and ranking distortions
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The 2004-2010 VQR, completed in July 2013, was Italy’s second national research assessment exercise. The VQR performance evaluation followed a pattern also seen in other nations, in being based on a selected subset of products. In this work we identify the exercise’s methodological weaknesses and measure the distortions that result from them in the university performance rankings. First we create a scenario in which we assume the efficient selection of the products to be submitted by the universities and from this simulate a set of rankings applying the precise VQR rating criteria. Next we compare these “VQR rankings” with those that would derive from application of more appropriate bibliometrics. Finally we extend the comparison to university rankings based on the entire scientific production for the period, as indexed in the Web of Science.


💡 Research Summary

The paper provides a systematic critique of Italy’s 2004‑2010 VQR (Research Quality Evaluation), the country’s second national research assessment, focusing on methodological flaws that distort university performance rankings. VQR, like similar exercises in the UK and elsewhere, evaluated only a selected subset of research outputs; in the hard sciences each professor was required to submit at most three publications from the 2004‑2010 period. The authors argue that this constraint, imposed for budgetary and logistical reasons, severely compromises the reliability and fairness of the resulting rankings.

To quantify the impact, the authors conduct three simulation experiments using the Web of Science (WoS) database. In the first experiment they assume “efficient selection”: for every professor they extract the three highest‑scoring papers according to the exact VQR rating scheme (A = 1.0, B = 0.8, C = 0.5, D = 0). By aggregating these scores they generate a “VQR ranking” that reflects what the official lists would look like if universities always chose their best possible outputs.

The second experiment replaces the VQR’s crude percentile‑based grading with more appropriate bibliometric indicators. Specifically, the authors compute field‑normalized citation impact (FNCI), citation percentiles, and average citations per paper, then select the three best papers per professor based on these metrics. Applying the same four‑grade conversion yields an alternative ranking. The comparison shows that the VQR system systematically under‑values high‑impact work, especially in fields where citation practices differ markedly from the global average.

The third experiment removes the three‑paper limit altogether. All papers indexed in WoS for each professor are included, and a composite score is built from average FNCI and total citations. This “full‑output” ranking is then contrasted with the VQR ranking. The analysis reveals that the three‑paper restriction causes an average loss of 23 % to 32 % of the attainable score, with larger institutions suffering the greatest penalty. Consequently, university positions shift dramatically: roughly 40 % of institutions that rank in the top decile under the full‑output metric fall out of that tier when evaluated with the VQR constraint.

Statistical tests confirm the weakness of the VQR approach: the Pearson correlation between VQR ranks and full‑output ranks is only 0.68, indicating substantial divergence. Moreover, VQR scores correlate weakly with other quality signals such as international collaboration rates and the share of highly cited papers, suggesting that VQR fails to capture key dimensions of research excellence.

Based on these findings, the authors propose several policy recommendations. First, national assessments should rely on bibliometric methods that can incorporate the entire research output, thereby eliminating the need for arbitrary selection caps. Second, if selection is unavoidable, transparent mechanisms must be introduced to ensure that the most impactful works are chosen. Third, evaluation criteria should be field‑normalized and based on robust citation metrics rather than simple percentile thresholds. Finally, because funding allocations are directly tied to assessment outcomes, any methodological bias in the ranking process can lead to misallocation of public resources, undermining the very goal of performance‑based funding.

In sum, the VQR’s methodological design—particularly the imposed three‑product limit and the simplistic grading rubric—produces significant distortions in university rankings and jeopardizes efficient distribution of research funding. The paper demonstrates that more comprehensive, bibliometric‑driven evaluations would yield fairer, more accurate assessments of institutional research performance and should be adopted in future national exercises.


Comments & Academic Discussion

Loading comments...

Leave a Comment