Akira Matsui # , Emily Chen # , Yunwen Wang  , Emilio Ferrara

# Contributed equally.

1 Department of Computer Science, University of Southern California, Los Angeles, California, United States.
2 Information Sciences Institute, University of Southern California, Marina del Rey, California, United States.
3 Annenberg School for Communication and Journalism, University of Southern California, Los Angeles, California, United States.

PeerJ. 2021 Sep 15;9:e11999. doi: 10.7717/peerj.11999.

The peer-reviewing process has long been regarded as an indispensable tool in ensuring the quality of a scientific publication. While previous studies have tried to understand the process as a whole, not much effort has been devoted to investigating the determinants and impacts of the content of the peer review itself. This study leverages open data from nearly 5,000 PeerJ publications that were eventually accepted. Using sentiment analysis, Latent Dirichlet Allocation (LDA) topic modeling, mixed linear regression models, and logit regression models, we examine how the peer-reviewing process influences the acceptance timeline and contribution potential of manuscripts, and what modifications were typically made to manuscripts prior to publication. In an open review paradigm, our findings indicate that peer reviewers’ choice to reveal their names in lieu of remaining anonymous may be associated with more positive sentiment in their review, implying possible social pressure from name association. We also conduct a taxonomy of the manuscript modifications during a revision, studying the words added in response to peer reviewer feedback. This study provides insights into the content of peer reviews and the subsequent modifications authors make to their manuscripts.