TY - JOUR AU - Elizabeth L. Pier AU - Joshua Raclaw AU - Anna Kaatz AU - Markus Brauer AU - Molly Carnes AU - Mitchell J. Nathan AU - Cecilia E. Ford AB - In scientific grant peer review, groups of expert scientists meet to engage in the collaborative decision-making task of evaluating and scoring grant applications. Prior research on grant peer review has established that inter-reviewer reliability is typically poor. In the current study, experienced reviewers for the National Institutes of Health (NIH) were recruited to participate in one of four constructed peer review panel meetings. Each panel discussed and scored the same pool of recently reviewed NIH grant applications. We examined the degree of intra-panel variability in panels' scores of the applications before versus after collaborative discussion, and the degree of inter-panel variability. We also analyzed videotapes of reviewers’ interactions for instances of one particular form of discourse—Score Calibration Talk—as one factor influencing the variability we observe. Results suggest that although reviewers within a single panel agree more following collaborative discussion, different panels agree less after discussion, and Score Calibration Talk plays a pivotal role in scoring variability during peer review. We discuss implications of this variability for the scientific peer review process. BT - Research Evaluation DA - 2017-01-01 DO - 10.1093/reseval/rvw025 IS - 1 N2 - In scientific grant peer review, groups of expert scientists meet to engage in the collaborative decision-making task of evaluating and scoring grant applications. Prior research on grant peer review has established that inter-reviewer reliability is typically poor. In the current study, experienced reviewers for the National Institutes of Health (NIH) were recruited to participate in one of four constructed peer review panel meetings. Each panel discussed and scored the same pool of recently reviewed NIH grant applications. We examined the degree of intra-panel variability in panels' scores of the applications before versus after collaborative discussion, and the degree of inter-panel variability. We also analyzed videotapes of reviewers’ interactions for instances of one particular form of discourse—Score Calibration Talk—as one factor influencing the variability we observe. Results suggest that although reviewers within a single panel agree more following collaborative discussion, different panels agree less after discussion, and Score Calibration Talk plays a pivotal role in scoring variability during peer review. We discuss implications of this variability for the scientific peer review process. PY - 2017 SP - 1 EP - 14 ST - ‘Your comments are meaner than your score’ T2 - Research Evaluation TI - ‘Your comments are meaner than your score’: score calibration talk influences intra- and inter-panel variability during scientific grant peer review UR - https://doi.org/10.1093/reseval/rvw025 VL - 26 Y2 - 2026-01-26 SN - 0958-2029 ER -