Flaminio Squazzoni  flaminio.squazzoni@unimi.it
ReviewerCredits Scientific Advisor
BEHAVE Lab, Department of Social and Political Sciences, University of Milan, Italy



On 24 July 2020, “Peer Review and Research Integrity” has published an interesting article on professionalism in peer-review by a team led by Travis G. Gerwing, an ecologist from the University of Victoria, British Columbia, Canada. The study is a good example of a new frontier: studying peer review by analysing the linguistic content of peer review reports. Examining how reviewers write their reports could help reveal the robustness of linguistic standards of peer-review and the existence of different styles in the academic discourse, not to mention the presence of bias and unprofessionalism. It could also help exploring the various dimensions of the quality of peer review and provide better assessment of the process.

The authors of this interesting study accessed about 1,500 reports, mainly from Publons or voluntarily shared by a group of early career researchers in the field of evolutionary ecology and behavioural medicine. Comments were coded for the occurrence of “unprofessional comments” and “incomplete, inaccurate or unsubstantiated critiques” using an a-prior rubric based on framework analysis. A group of blinded human coders were used to classify reports on various linguistic dimensions. From the text, information on manuscripts, authors and journals were removed to preserve anonymity and confidentiality. Presented as absolute numbers and percentages, results indicate that 12% of comment sets included at least one unprofessional comment towards the authors or their work, while 41% (611) contained incomplete, inaccurate of unsubstantiated critiques. Authors suggested that unprofessional comments could be minimized by the adoption of a peer review code of conduct.

Some days before this study was published, with a team of co-authors from the University of Split School of Medicine and the University of Valencia, we published a paper in “eLife” based on a sample 472,449 peer-review reports from 61 journals from Elsevier. We performed a quantitative analysis of their linguistic content. Given that we employed machine learning techniques and not human coders, we could measure a variety of characteristics (including analytical tone, authenticity, clout, three measures of sentiment, and morality) as a function of reviewer recommendation, area of research, type of peer-review and reviewer gender. Compared to Gerwing et al.’s study, we could not develop in-depth expert classifications, but we were at least free to exploit all the information included in such a large dataset. Our results were less critical than Gerwing et al.’s on the current situation of peer review. We found that reviewer recommendation had the biggest impact on the linguistic characteristics of reports, and that area of research, type of peer review and reviewer gender had little or no impact. Reasonably, reports were less emotional and more analytical when suggesting major revisions or rejections and longer and more elaborated when referees were explaining needed revisions. We conclude that the lack of influence of research area, type of review or reviewer gender on the linguistic characteristics was a sign of the robustness of peer review.

Obviously, these are two different studies with different purposes, samples and approaches. However, there is some common lessons. The first is that large scale, across-journal data are needed to perform systematic analysis on report languages that aims to drawn valid conclusions on the situation of peer review. I would sincerely prefer to see more robust statistical analysis than absolute numbers, averages or percentages to support any conclusion on certain positive or negative aspects of peer-review. This is obviously hard when important factors, e.g., gender, reviewer recommendations, and important manuscript details are not available or must be removed due to confidentiality as in Gerwin et al.’s study. The second one is that peer review is a dialogue: concentrating only on peer review reports without contextual data on manuscripts, reviewers and journals is like dancing tango alone. Perhaps funny for someone, probably good gymnastic for keeping someone in a good shape, but this view misses the beauty and the whole point of tango. For instance, what about the unprofessionalism of many authors that reviewers are request to tolerate? These questions reinforced my understanding that our capacity of measuring the quality or other aspects of peer review, and in general the whole possibility of assessing the process, will increase when quantitative studies and more qualitative approaches will be working together. It’s time to bridge the divide between quantitative and qualitative research in this field. Again: It takes two to tango!


  • Buljan, I., Garcia-Costa, D., Grimaldo, F., Squazzoni, F., Marusic, A. (2020) Meta-Research: Large-scale language analysis of peer review reports. eLife 2020;9:e53249 https://elifesciences.org/articles/53249
  • Gerwing, T.G., Allen Gerwing, A.M., Avery-Gomm, S. et al. (2020) Quantifying professionalism in peer review. Research Integrity & Peer Review 5, 9. https://doi.org/10.1186/s41073-020-00096-x