试验更好的方法来评估研究人员

2014年,剑桥大学计算机科学家尼尔·劳伦斯 被要求主持称为EuroNIPS(神经信息处理系统)的年度机器学习和计算神经科学会议。在会议记录是发布结果的常见方式这一领域中,提交给审稿的论文不到四分之一获得了演示位置,这是这类享有声望的会议的典型代表。 Google研究部副主席Corinna Cortes, suggested they take the opportunity to experiment with the peer review process and see what level of consistency it provides. Peer review is considered a ‘gold standard’ in science and generally receives widespread support from researchers, but there is evidence that it may not always be an impartial arbiter of quality. The EuroNIPS conference provided an opportunity to investigate this further.

劳伦斯和科尔特斯召集了两个审阅小组,提交的论文中有10%(166篇论文)被两个小组审阅。最终,这些论文中有43篇得到了每个小组的不同决定,委员会对最终被接受发表的38篇论文中的57%意见分歧。结果比随机结果要好一些(您可能希望在接受论文的77.5%上存在分歧)–但这对于一个年轻的研究者来说可能是一个决定职业的决定是否足够好?

其他研究也显示出类似的不一致之处。在2018年,43位审稿人获得了相同的25张匿名美国国立卫生研究院拨款申请进行审阅,并且在他们的定量或定性评估中未表示一致。该研究得出的结论是,同一应用程序的两个随机选择的等级与两个不同应用程序的两个随机选择的等级平均相似。1

Perhaps it is unreasonable to expect high levels of consistency from peer review. As Lawrence states, ‘you are sampling from three people who are not objective … they’ve got particular opinions’. This may not have once been a problem, but the current competitive nature of academia has made each peer reviewed decision highly significant. ‘If you have a funding rate of 5%, or 10%, you’re going to have very few winners and a lot of losers, and a lot of undeserving losers and some undeserving winners,’ says 约翰·波伦,来自美国印第安纳大学布卢明顿分校的复杂系统专家,他一直在研究分配资金的替代模型。

差异贯穿整个发布过程

Aileen Day, Royal Society of Chemistry

Lawrence agrees that the root of the problem is how career-defining peer reviewed decisions can be. ‘Whether [a student] managed to get [their] paper [into the EuroNIPS conference] shouldn’t be the be-all-and-end all … unfortunately, it often is.’

对于赠款分配,还有一个问题,就是浪费时间浪费在写作和同行评审最终没有资金的建议上。根据澳大利亚研究人员的一项估计,他们准备了一年中总共提交的3,700项建议,代表了五个世纪的研究时间。2 But Bollen says this is not to criticise funding agencies. ‘A lot of really good people are involved … but a system that funds only 15% of the applicants, leaving 85% with zero money, cannot be efficient.’

无意识的偏见

同行评审的另一项指控是,评审员无疑会带有无意识的偏见。这些刻板的刻板印象是无意的,但根深蒂固并能够影响决策。 2019年,英国皇家化学学会(RSC)撰写了一份报告,探讨了2014-2018年向其期刊投稿成功中的性别差距。 RSC数据科学家Aileen Day发现在所有阶段(包括同行评审)都存在差异。3 ‘It was small, but it was significant,’ says Day. For example, while 23.9% of corresponding authors who submit papers are female, only 22.9% of papers accepted for publication have female corresponding authors. ‘The important thing is differences stack up through the whole publication process,’ explains Day.

The study also found that men and women behave differently as reviewers; ‘If you were a woman you were more likely to say major revisions, if you were a man, reject,’ says Day. Reviewers also give preferential treatment to their own gender.

图像显示大拇指朝下的迹象

为了消除无意识的偏见,RSC已发布科学出版中的行动框架,其中提供了编辑委员会和工作人员可以采取的使出版物更具包容性的步骤。 7月,包括美国化学学会和爱思唯尔在内的其他出版商同意加入RSC,以致力于监测和减少科学出版中的偏见。该小组代表着7000多种期刊,已同意汇集资源和数据,并努力争取适当地代表作者,审稿人和编辑决策者。该工作组将在政策制定,与多样性数据收集相关的良好实践以及在试用新流程方面分享的经验教训方面进行协作。 

Journals are clearly trying to ensure a better gender balance of reviewers and many institutions provide training for staff and reviewers to overcome unconscious biases , but there are differences in opinions on the effectiveness of these measures. A recent report by the professional HR body the Chartered Institute of Personnel and Development highlighted the ‘extremely limited’ evidence that training could change employee behaviour.

我想,为什么我们不只是给大家写一张支票?

Johan Bollen,印第安纳大学布卢明顿分校,美国

吉凶祸福

One idea for preventing bias is to hide the identity of the author of the paper or proposal, something the Engineering and Physical Sciences Research Council (EPSRC) have looked at. ‘We have trialled a number of novel approaches to peer review over the years including those involving anonymous or double-blind peer review,’ says head of business improvement Louise Tillman. But many reviewers say in relatively small academic communities it’s difficult to ensure anonymity .

在另一端,一些出版物已经进行公开的同行评审。例如,二月性质 announced it was offering the option of having reports from referees (who can still chose to stay anonymous) published alongside author responses . ‘In an ideal world, you might want the name of the reviewers to be open as well, but there’s a challenge there,’ says Lawrence, ‘[referees] might be unwilling to share a forthright opinion.’ The machine learning journal edited by Lawrence publishes peer reviews alongside papers. ‘That may have been my favourite innovation to come from the EuroNIPS experiment,’ he notes.

Some subjects seem to be slower to change than others. There are few examples of open review chemistry journals, for example, though there are signs of movement here recently – the RSC’s two newest journals offer authors the option of transparent peer review (publishing their paper’s peer review reports). Lawrence says the slow pace may relate to chemistry publishing being dominated by large society publishers. ‘Communities that have struggled are those that have professional institutions managing their reviewing,’ suggests Lawrence, who sees these types of organisation as slow to change.

资金彩票

If peer review is such a lottery, why not actually replace it with a lottery? Several funding bodies have trialled such systems. From 2013 the New Zealand Health Research Council has awarded its ‘explorer grants ’ worth NZ$150,000 (£76,000) using a random number generator to select from all applications that were verified to meet its criteria . Several other funders have tested the idea: the Swiss National Science Foundation experimented with random selection in 2019 , drawing lots to select postdoctoral fellowships, and Germany’s Volkswagen Foundation has also used lotteries to allocate grants since 2017. Another, perhaps less brutal model, though not yet tested, suggests that applications not funded go back into the pot – creating a system more like premium bonds.4

最近对新西兰计划的审查(总体接受率为14%)显示,接受调查的申请人中有63%赞成该彩票,而25%反对该彩票。也许不足为奇,获胜者中的支持更高!但是受访者还报告说,该系统并没有减少他们准备申请所花费的时间,因为他们仍然需要通过初始质量门槛才能进入彩票。5

Such systems ‘uniquely embody the worst of all possible worlds,’ says Bollen. ‘It’s essentially funding agencies and scientists saying we cannot do decent [peer] review… its almost spiteful.’

Bollen has come up with another idea for funding researchers: ‘I figured, why don’t we just write everybody a cheque?’ In 2019, inspired by mathematical models used in internet search engines, he published his idea for a ‘self-organised fund allocation’ system, where every scientist periodically receives an equal, unconditional amount of funding. The catch is that they must then anonymously donate a given fraction of this to other scientists who are not collaborators or from the same institution.6 Those scientists would then re-distribute a portion of what they receive. Bollen says the model ‘converges on a distribution and allocation of funding that, overall, reflects the preferences of everybody in that community collectively’ – perhaps the ultimate peer review. ‘The results could be just as good, just as fair as the systems that we have now, but without all of the overhead,’ he adds.

Tweaking the model could help solve current issues of bias, for example by mandating researchers to give a certain proportion of their funds to underrepresented groups. Of course such a system might favour academics who ‘talk a good game’ and disadvantage those in obscure fields, but that’s already the case in the present system, says Bollen. So far his model has had no takers, but has received lots of interest from colleagues and funding bodies. He is hopeful that after the current period of upheaval there may be an appetite for change.

Lawrence thinks we may just need to accept that peer review is always going to have flaws; ‘the idea that there’s a perfect, noise free system is the worst mistake.’ A number of recent well-publicised retractions, including a Lancet paper on Covid-19 hydroxychloroquine treatment, show that peer review is not error-proof. According to the website Retraction Watch at least 118 chemistry papers were retracted in 2019. Ultimately we may need to be realistic about what we mean by ‘gold standard’. ‘[Peer review] may be the best system we have for verifying research, but [that doesn’t always] mean [that] the research that’s rejected is somehow flawed or the research accepted is somehow brilliant,’ says Lawrence.

 

This article was updated on 26 August 2020. An earlier version referred to a comparison of publishing data from different countries, which had not been verified, and Aileen Day’s quoted use of the word ‘bias’ has been clarified to ‘differences’.