Disruptive Research?

The news headlines raise alarms about the current state of scientific research: “Why is science slowing down?” “The Consolidation-Disruption Index Is alarming: Science has a crummy-paper problem” “Innovation in science is on the decline and we're not sure why” “What happened to all of science’s big breakthroughs?

These headlines, just a few of those reporting on results from the recently published paper “Papers and patents are becoming less disruptive over time” (Park et al., 2023), give the impression that scientists of today are less creative and innovative than those of the past and that the glory days of science are over. The data analysis by Park et al. (2023), however, does not support these dire conclusions. Let’s look at what the authors actually do in the paper and what the results imply.

The “CD Index”

Park et al. (2023) analyzed citations to 25 million papers in the Web of Science, a database that gives bibliographic citations of articles in more than 9,000 scientific journals dating back to 1900. Using this citation information, Park et al. (2023) calculated a “CD index” for each paper published between 1945 and 2010, stating that the CD index “characterizes the consolidating or disruptive nature of science and technology.”

Here is how they compute the CD index of each “focal paper” (the paper for which the index is computed).* For a given focal paper, they consider each paper in the Web of Science that (a) is published less than five years after the focal paper and (b) cites the focal paper and/or at least one of the references listed in the focal paper. Each of these future papers is given a score: -1 if the future paper cites both the focal paper and at least one of the references in the focal paper; 0 if the future paper cites at least one of the references in the focal paper but not the focal paper itself; 1 if the future paper cites the focal paper but none of the references in the focal paper. The CD index of the focal paper is the average of the scores of the future papers associated with it.

Thus, the CD index is computed exclusively from citation patterns in future papers. If all of the future papers cite the focal paper but none of its references, the focal paper achieves the maximum CD index of 1. If all of the future papers cite the focal paper and at least one of the articles referenced by the focal paper, the focal paper achieves the minimum CD index of -1.

Park et al. (2023) call a paper with a high CD index “disruptive,” arguing that if a paper is truly innovative and leads science in a new direction, subsequent research articles will cite it (because it breaks new ground) but are less likely to cite its (presumably less innovative) predecessors. They call a paper with a low CD index “consolidating,” since subsequent research citing it also tends to cite its predecessors—Park et al. (2023, p. 139) argue that ”for future researchers, the knowledge upon which the work builds is still (and perhaps more) relevant.”

The main result of Park et al. (2023)—the result that spurred the newspaper headlines at the beginning of this column—is that the average CD index of papers published in each year has declined since 1945, with the sharpest decline occurring before 1970. In the physical sciences, for example, they found that the average CD index dropped from 0.36 in 1945 to 0 in 2010.

Peer Review and Statistical Issues

That result—the decline in the 5-year CD index between 1945 and 2010 for every scientific area studied—spurred the science-sky-is-falling headlines listed above. There are several statistical reasons, however, to treat those headlined conclusions with skepticism. I’ll mention a few of them here; see Bornmann et al. (2020) for a thoughtful discussion of issues with citation measurements (including a study of how these measures correlate with peer reviewers’ judgments) and a literature review of other work.**

  1. Calling a statistic a measure of disruptiveness does not make it so. I can define a “heart health index” by computing the number of cups of oatmeal a person eats in a year but that doesn’t mean my index measures anything of interest (except, perhaps to Quaker Oats).

    The result that should be reported (if any) from this study is that the percentage of papers that are cited by future work but with no citations of the paper’s predecessors was higher in the 1940s and 1950s than in later years. This is more difficult to communicate (and to make sound exciting so readers will click on a news story) than announcing “a steady drop since 1945 in disruptive feats” but has the advantage of being an accurate portrayal of the paper’s findings.

    Incidentally, Park et al. (2023) do not say that the number of papers with high CD index has declined over time, just that the proportion of papers with high CD index has declined. The number of papers published per year today is much higher than 75 years ago. Phrases such as “a steady drop in disruptive feats” misrepresent the paper’s claims.

  2. Peer review and citation practices have changed between 1945 and today. Statisticians are cautious about making causative conclusions from observational data, because there are often other factors at play (called confounding variables) that can explain an observed result. There were many changes in science between 1945 and today, and the decline in the CD index could be due to these changes rather than any decline in “disruptive” research. In particular, the scientific publishing process today differs greatly from that in the 1940s and 1950s.

    Scientific papers today undergo “peer review” in which the paper is sent to a set of experts in the field who review the results for accuracy and importance, make suggestions for improvements, and recommend acceptance or rejection for the journal. Routine peer review as part of the scientific process, however, is a relatively recent development, with some scholars dating it back to the mid-1970s. Before then, a journal editor typically accepted or rejected an article on his own authority, and only occasionally sought input from other scientists. Baldwin (2018) told the story of Einstein’s fury in 1936 upon learning that the editor of Physical Review sent his submitted paper to another physicist for review; he wrote to the editor that he had not authorized the editor to show the manuscript to anyone before publication and that “[o]n the basis of this incident I prefer to publish the paper elsewhere.”

    Consider Watson and Crick (1953), the paper that proposed the double helix structure of DNA (given as an example of a high CD index by Park et al., 2023). Watson and Crick (1953) cited only six papers in their two-page note. Their reference list included work establishing that DNA had roughly equal amounts of adenine and thymine, and roughly equal amounts of guanine and cytosine, but did not include other milestones such as the discovery of DNA. Significantly, Watson and Crick did not cite the work of Rosalind Franklin and her students, whose X-ray photographs of DNA samples were crucial for the insight that the structure is a double helix. Franklin’s colleague Maurice Wilkins had shown the photographs to Watson and Crick without her permission; Watson and Crick had also read an internal report of the British Medical Research Council that summarized Franklin’s unpublished research (some of which was published in the same issue of Nature as the Watson and Crick paper).

    Watson and Crick (1953) was not peer-reviewed. The editors decided to publish it without obtaining outside opinions (Baldwin, 2015). If it had been refereed, however, the reviewers would likely have requested that the authors include citations to other notable work. And had Watson and Crick (1953) included references to earlier milestones or to Franklin’s work, that work may well have been cited by subsequent research papers, thereby lowering the CD index of the paper.

    By contrast, consider Jinek et al. (2012), one of the papers on gene editing that resulted in a Nobel Prize for Jennifer Doudna and Emmanuelle Charpentier. Jinek et al. (2012) contains 47 references and was peer-reviewed. Papers today cite many more references than in the past, and the sheer numbers make it more likely that a subsequent paper will cite one of the references.*** Reviewers often suggest additional literature that should be cited; some authors include extensive bibliographies in their submitted manuscripts to forestall potential criticism that they may have overlooked an important precursor to their work.

  3. Using easily manipulated measures encourages people to game the system at the expense of quality. The statistician W. Edwards Deming frequently wrote about the hazards of using numerical measures to rank people: “The problem lies in the difficulty to define a meaningful measure of performance. The only verifiable measure is a short-term count of some kind.” (Deming, 1986, p. 103). Deming wrote that quality results from establishing a system dedicated to quality improvement and allowing people to take pride in their work. Establishing numerical quotas is antithetical to quality, and promotes the narrow activity measured rather than broader goals.

Evaluating the significance of a piece of scientific research requires expertise and deep study. Substituting a single number (or even a set of numbers) for that process inevitably misses aspects of innovation and provides incentives for scientists to achieve a high score rather than focusing on producing high-quality research. In the social science, this corruption of quantitative indicators is known as Campbell’s Law: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor” (Campbell, 1976, p. 49).****

Park et al. (2023) suggest some actions that could be taken to promote innovative research, including having universities place more emphasis on research quality rather than quantity and giving scientists more time to “inoculate themselves from the publish or perish culture, and produce truly consequential work.” Ironically, one of the drivers of the publish or perish culture is the substitution of easily computed numbers (such as publication counts, citation counts, and journal impact factors) for the difficult and subjective process of evaluating quality (see McKiernan et al., 2019 for a study of the use of journal impact factors in university promotion and tenure decisions).

Suppose that, instead of relying so heavily on manipulable metrics, universities and funding agencies focused on system-wide changes to promote innovation—encouraging creativity and discouraging busywork. Now that would be truly disruptive.

Footnotes

*The formula given in Figure 1 of Park et al. (2023) is CDt = (1/n) ∑i (- 2 fit bit + fit),

where

n = number of forward cites to the focal paper or its predecessors at time t

fit = 1 if future paper i cites the focal paper; 0 if not,

bit = 1 if future paper i cites the predecessors of the focal paper; 0 if not.

**See what I did here. If this blog post were a refereed paper, then by citing Bornmann et al. (2020) but none of the papers it references, my citation would be contributing to a higher CD index for Bornmann et al. (2020) even though my primary purpose for citing the paper is its literature review (i.e., its “consolidation” of previous work).

***Park et al. (2023) included supplementary analyses in which they “normalized” the CD index to account for changing citation patterns but do not give details of how the calculations were done. Most normalization methods, however, would just adjust for the number of citations and would not necessarily reflect the deeper changes underlying the evolving citation patterns.

****I wrote about how numerical educational evaluation systems known as value-added models could be gamed, and about better uses of statistical methods for improving quality of teaching, in Lohr (2012, 2015).

References

Baldwin, M. (2015). Credibility, peer review, and Nature, 1945–1990. Notes and Records: The Royal Society Journal of the History of Science, 69(3), 337-352.

Baldwin, M. (2018). Scientific autonomy, public accountability, and the rise of “peer review” in the Cold War United States. Isis, 109(3), 538-558.

Bornmann, L., Devarakonda, S., Tekles, A., and Chacko, G. (2020). Are disruption index indicators convergently valid? The comparison of several indicator variants with assessments by peers. Quantitative Science Studies, 1(3), 1242-1259.

Campbell, D.T. (1976). Assessing the impact of planned social change (Paper #8 of the Occasional Paper Series). Hanover, NH: Public Affairs Center, Dartmouth College. Reprinted in Journal of Multidisciplinary Evaluation in 2011.

Deming, W.E. (1986). Out of the Crisis. Cambridge, MA: Massachusetts Institute of Technology Center for Advanced Engineering Study.

Jinek, M., Chylinski, K., Fonfara, I., Hauer, M., Doudna, J. A., and Charpentier, E. (2012). A programmable dual-RNA–guided DNA endonuclease in adaptive bacterial immunity. Science, 337(6096), 816-821.

Lohr, S. (2012). The value Deming’s ideas can add to educational evaluation. Statistics and Public Policy, 3(2), 1–40. doi:http://dx.doi.org/10.1515/2151-7509.1057

Lohr, S. (2015). Red beads and profound knowledge: Deming and quality of education. Education Policy Analysis Archives, 23(80), 1-24. https://doi.org/10.14507/epaa.v23.1972

McKiernan, E.C., Schimanski, L.A., Nieves, C.M., Matthias, L., Niles, M.T., and Alperin, J.P. (2019). Meta-research: Use of the journal impact factor in academic review, promotion, and tenure evaluations. Elife, 8, e47338.

Park, M., Leahey, E. and Funk, R.J. (2023). Papers and patents are becoming less disruptive over time. Nature 613, 138–144. https://doi.org/10.1038/s41586-022-05543-x

Watson, J.D., and Crick, F.H.C. (1953). Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid. Nature, 171(4356), 737-738. The San Francisco Exploratorium posted an annotated version of this paper that explains its contributions, history, and controversies.

Copyright 2023 Sharon L. Lohr

Sharon Lohr