Heavy reliance is currently placed on various forms of peer review to promote the growth of basic knowledge [1]. How well do these various forms of peer review promote such growth and how well do our current institutions compare, as systems of incentives and communication, with the many alternative mechanisms used by different cultures across the centuries? Many of these alternatives, such as simple prizes, rely much less on peer review.
While much has been learned about our scientific institutions, current and past, unfortunately very little is known about how well they promote scientific progress. For example, without better ways to compare the "amount" of scientific progress made in different times and places, it is hard to interpret the history of natural experiments with various institutions.
Similarly, while sociological studies have begun to explore bias and disagreement in various forms of peer review [2], such studies now say little about the overall performance of these institutions -- they usually lack, for example, good measures of the intrinsic quality of various items reviewed. Apologists for the status quo find it easy to tell stories about how high bias or disagreement are actually healthy, given their conception of how the rest of the academic process works.
Ideally, controlled experiments would be preferred; let different groups of equally qualified, funded, and informed scientists, organized under different scientific institutions, independently attack the same scientific problems. Then compare their progress toward what would be agreed was the 'right' answer. With enough such experiments, it might be presumed that the groups that made the fastest progress had the better institutions. However, such experiments would be terribly expensive.
As a substitute, consider turning to the methodology of experimental economics [3]. Experimental economists often try to illuminate the usefulness of various real-world institutions through laboratory experiments on simplified game-like analogues.
Strangers are gathered to play such a simulation game, and each player is paid according to their performance in the game. Player behavior can be compared under alternative game rules, allowing a comparison of these institutions to theory and to each other.
These simplified games by no means capture all the complexity of the real-world situations that they are intended to illuminate. They are more like wind-tunnel models of aeroplanes, which engineers hope will suggest how real planes might behave in real weather.
Yet the economic theories that are usually invoked to argue for one institution over another often do not distinguish between the real situation and the simplified game -- they suggest similar predictions in both cases. Thus when an economic theory does not predict game behavior well, the burden of proof shifts toward those who would argue that this theory should still be relied on to analyze analogous real institutions.
Similarly, imagine that you thought a certain form of peer review was a robust scientific institution, working well for a wide variety of people, with different temperaments, cultures, and abilities; and for a wide variety of scientific problems, in different disciplines and requiring different mixes of data and theory, creativity and persistence, breadth and depth of understanding, etc. If so, you might well expect a laboratory version of this form of peer review to work well when used to pay a random group of laboratory subjects to discover things about some 'scientific' laboratory problem. If this institution didn't work well in the laboratory, you might want to reconsider your expectations about similar institutions in other contexts.
Perhaps an experimental economics approach could help us to evaluate various academic research institutions. However, to work this approach requires that simplified game-like models of the academic research process be devised, models that both apologists and critics will grant capture the essence of both the relevant academic incentive structures, and of the scientific tasks that need to be accomplished.
This is a challenging task, and so this paper offers only a first attempt at devising such models, to make vivid the possibilities. Specifically, I suggest using canned murder mystery games as standardized "scientific" problems, and I suggest two simplified institutions to compare: a bare-bones peer-reviewed journal mechanism, and a simple market-based "information prize". This market institution is more in the spirit of the institutions traditionally considered in experimental economics.
I suggest specific experimental procedures for comparing these institutions on these problems, although I have not yet conducted such experiments. Should this form of peer review work surprisingly poorly or well, both apologists and critics of current institutions will have predictable fall-back positions. However, we have to start somewhere.
My simple model of the scientific research situation has one research "patron" and many "researchers". The researchers care only about money, not about research. They might optionally be given an opportunity cost -- some attention-consuming task, perhaps copying pages of text by hand, for which they could be paid some constant wage per minute, say about two-thirds the expected wage if everyone worked as researchers.
The patron would rather keep her money, but she also wants to know the answers to one or more specific research questions of interest to her. She wants to know these answers as soon as possible, and doesn't care who else also knows these answers. Specifically, an experiment might require that, once a minute, the patron declare (or update) a probability distribution over the possible answers to each question. She might then be paid according to a proper scoring rule on each distribution -- for example, some large constant amount plus the sum of the logs of the probabilities given the "right" answer each minute.
The patron is not an expert, having no direct access to information relevant to her question. None the less, I assume that after some sufficiently long time-scale, say just before the "end" of an experimental session, the patron can find out what the "right" answer to her questions are, and hence find out her pay-off, regardless of the researcher's behavior.
The researchers are experts, in that they do have access to relevant information. The patron is thus willing to lose some of her money to the researchers, to induce them to tell her what they know and to work toward answering her questions. From the patron's point of view, the output of the research community is the changing current state of "knowledge", i.e. temporary consensus regarding the relative promise of various possible answers to the patron's questions.
In general, the patron's problem is how to buy consensus and research so as to get the most value for her money. Since the researchers know much more than she does, the patron may want to rely on some institutional framework to help her, such as the institutions that are discussed below.
One simple inference task would be to estimate some number, with noisy signals of that number made available at various times to various researchers at various prices. Such a number might even drift according to some rule. Other simple inference tasks include playing Twenty Questions with an oracle, or solving those logic problems that you see in crossword puzzle books.
While these tasks may be simple enough to allow a rigorous theoretical analysis, they also seem a bit too simple to be seriously compared to typical scientific research tasks. Standardized inference tasks are wanted that can be solved in a reasonable time, but also needed are rich and complex problems, much more information processing problems than data-gathering tasks. Even when everyone has the same raw data, subjects should often disagree on the relative promise of different possible answers.
A more complex task would be that of estimating the fraction of votes each candidate will receive in an forthcoming election. In fact, experiments have shown that simple market mechanisms can induce much more accurate vote estimates than are provided by concurrent opinion polls [4]. These experiments took many months each, however, much longer than is wanted here.
I suggest that murder mysteries seem a reasonable compromise, posing complex yet quick and standardized inference tasks. For example, subjects might watch twenty-minute murder mystery videos together, and the patron's question might just be "who done it". (VIDMAX Inc., Cincinnati, OH, sells at least two videodisks with eight such stories each, each with eight suspects.) Alternatively, subjects might each be given a somewhat different written packet of information (such as in "How to Host a Murder" games, Intuitive Marketing).
Thus I propose an environment containing a patron who wants to know "who done it" as quickly as possible, and containing money-loving researchers with access to clues.
All the institutions that one might hope to evaluate typically involve a great many scientists, yet most laboratory experiments will only involve a small group of subjects. To avoid the results being dominated by the complex social dynamics known to exist in small groups, one might prefer to forbid informal interaction -- use subjects who are initially strangers to each other, forbid informal discussion, and require all interaction to take place through formalized institutional channels.
My bare-bones model of a peer review institution focuses on journal peer review, which seems more standardized and beloved than proposal peer review. This model also focuses on rewards per publication, rather than per citation. Thus I suggest having one journal, and letting some fixed pot of money be distributed after some fixed duration in proportion to each researcher's publication count.
Journal articles can simply be 3 x 5 cards with the author's ID at the top, and some other content on the rest of the card. Each researcher may publish one free article at the start of a session, and after that journal issues are published whenever three more articles submitted are accepted for publication.
A researcher submits an article by submitting three identical cards, which are distributed discretely to three reviewers, each the author of a random previous publication. Reviewers evaluate by writing comments on their copy, and by assigning an integer score from zero to ten. Reviewers needn't return these reviews immediately, but they may not submit new articles of their own until they do. Review comments are returned to the author anonymously, without flagging the reviewer.
Experiment instructions must be careful not to suggest what criteria to use in reviewing articles -- no "moral obligation" to help the patron should be suggested. Researcher subjects can be told, however, about the existence of the patron subject and about her incentives. An article is accepted if the average score is at least five. Copies of each new journal issue are distributed to all researchers (presuming that a photocopier is nearby). A token representing each new article is placed in a opaque bag to allow random selection of reviewers. The patron also gets a copy of each issue -- in a bare-bones peer review model she must choose her probability distributions based only on published journal articles. Similarly, subjects may not communicate other than by writing and reviewing articles.
That completes my bare-bones peer review model. There are of course endless variations one could consider, including co-authors, institutional teams, and competing journals managed by editors. A citation-based variation might divide a pot in proportion to citations, and select some reviewers from the authors of cited articles. A proposal-based variation might have submissions include a dollar amount of funding requested, and for each funding period the highest scoring submissions might be funded until a fixed pot runs out.
While some [6] have suggested that certain markets could also be used to fund research, no one to my knowledge has suggested, much less demonstrated, how such markets could be used to buy research on arbitrary patron-chosen questions. Without patron funding, markets in such questions may be too thin to be very well informed, and may not even exist.
I claim that such markets can, however, be used to fund and induce research [7]. Specifically, a patron could create an inventory-based automated broker, a trader whose price for fixed-size trades is a monotonic function of its inventory. If one arranges the bid-ask spread so that the broker just breaks even when prices move in a cycle, then this broker will lose money to better informed traders as the price moves to a final resting point of certainty regarding one of the possible answers.
The amount of funding lost in this process in this way can be precisely controlled, and constitutes an "information prize" offered to those who are first to become better informed. The prospect of winning this prize creates an incentive for potential traders to become better informed, and hence can induce research, just as with other types of prizes.
More specifically, subjects could be allowed to exchange 100 "francs" (experiment currency) for a bundle of coupons (or state-contingent assets), one for each suspect: "100 francs if suspect 1 did it", "100 francs if suspect 2 did it", etc. This trade would also be allowed in reverse at any time, and at the experimental session's end, each coupon for the guilty suspect could be exchanged for one franc.
An information prize broker can be implemented by a rectangular array of slots, for all combinations of suspects and possible broker prices. One reasonable set of broker prices, in francs, is:
1 3,5,7,10,12,15,20,25,30,40,50,60,70,80,90.Each slot contains either a coupon of the suspect for that column, or contains (franc) cash in the amount of the price for that row.
The trading rule is this: any subject can switch any slot between these two states. If there is a coupon in a slot, one can take it and replace it with the right amount of cash. If there is cash there, one can take that and replace it with the right coupon.
Initially, low-price slots contain cash and high-price slots contain coupons. For initially equi-probable suspects, the borderline slots are those whose prices straddle 100 divided by the number of suspects. The total value given away is the initial array value, the cash in initial array plus the number of rows of coupons above the cash, minus the final array value, the sum of the prices in any one column. For eight suspects and the broker prices given above, the information prize would be 786 francs. For reasonable liquidity, each researcher should start with at least 200 francs cash in hand.
The information prize institution, then, just allows subjects to make the trades described above. As with peer review, a bare-bones version would not allow researchers to talk to each other, and the patron could only view the current state of the broker array.
A session might consist of ten researcher subjects who watch an eight-suspect twenty-minute murder mystery video over thirty minutes, with ten minutes of planned breaks along the way.
If twenty subjects were recruited to be researchers, they could be divided into two groups of ten. Ideally, the subjects in each group would be strangers to each other. One group could do two sessions with peer review first, and then two sessions with an information prize. The other group could reverse the order of the institutions. This would take two hours minimum per group. With time for instructions and other delays, about three hours per group, which could be done in a single meeting. To control for motivation level, both institutions could give away the same total funding per session, say 786 francs. This suggests a 9 francs per dollar exchange rate to pay an average of $15 per hour if subjects are paid $10 just for showing up.
In addition to researcher subjects, patron subjects are also needed. Since the institutions above do not provide a way for the patron to influence the researchers, patron subjects do not need to be present at the same experiment meetings. Instead, a complete record of publication times and price movements could be recorded, perhaps on videotape, to be played back to patrons at a later date.
Three redundant and independent patrons per researcher group could be used, each going through the same four sessions as their researcher group. Using the same net wage and initial payment an estimate of about 2.6 francs per minute during sessions is made, which suggests a minimum of 0 and a maximum of 6 francs for each scoring rule application. With eight suspects, total ignorance gives a score (natural log of p, the probability given the right answer) of -2.08 and perfect knowledge gives a score of 0, which suggests paying 6 + 3*Ln(p) francs each time the scoring rule is applied. Patron subjects could usefully be put through some scoring rule calibration exercises before this experiment, and might even be selected from a larger potential subject pool on this basis.
The total subject wages estimated above would be around $1200. Computerization of this environment and institutions would aid in experimental control, and lower the cost of replication, but at a substantial development cost.
The amount paid to the research patrons on average under an institution is a estimate of the quality of that institution. Statistics on these amounts, or on the full-time series of patron predictions, could be used to draw preliminary conclusions about which institution works better, at least from a patron's point of view.
The main value of a pilot like this is to suggest how to run more elaborate experiments. Many variations can be imagined. The peer review institution could be elaborated in response to criticism. One could see how results scaled to larger research communities. A direct management type research institution might be compared. One might allow different funding institutions to co-exist in each experimental session -- the fraction of funding given via each institution could be systematically varied, or patrons could be allowed to choose these fractions.
2. US General Accounting Office. "Peer review -- reforms needed to ensure fairness in federal agency grant selection". Technical Report GAO/PEMD-91-1, June 1994.
3. Davis, D. and Holt, C. "Experimental Economics", Princeton University Press, Princeton (1993).
4. Forsythe, R., Nelson, F., Neumann, G. and Wright, J. "The explanation and prediction of presidential elections: a market alternative to polls", In T. Palfrey (ed.), Laboratory Research in Political Economy, University of Michigan Press, Ann Arbor, 69-112 (1991).
5. Ibid.
6. Hirshleifer, J. "The private and social value of information and the reward to inventive activity", American Economics Review, 61, no. 4 (1971): 561-574.
7. Hanson, R. "Information prizes--patronizing basic research, finding consensus", in Western Economics Association meeting, Lake Tahoe, June 1993; idem, "Could gambling save science? Encouraging an honest consensus", Social Epistemology, 9(1995); 3-33 (This issue).