Someone forwarded me a recent twitter announcement saying that a preprint conclusively shows that the National Science Foundation exhibits systemic racism in its grant funding decisions: "Decades of systemic racial disparities in funding rates at the National Science Foundation," C.Y. Chen et al. (2022). The announcement was followed by a large number of retweets proclaiming that this is amazing work and that something needs to be done about the National Science Foundation grant funding process. The main author of the study also tweeted out a message about the paper saying to read it and think about it, both of which I did. I came to the conclusion that this preprint and its accompanying social media hype highlight the danger of just saying "studies show" without actually understanding what a given study can or cannot really show, and are an indication of the rise of what I call "Twitter Science" as I describe in this essay.
I am often asked by my nonscientist family members and friends for my opinion about a new science result they just heard about on the media, Facebook, or Twitter. Often these stories begin, "scientists prove" or "studies show." When I look at the actual scientific paper being highlighted, I find that sometimes it is something that is simply cute or amusing but not exactly groundbreaking. Other times it is clear that what is presented in the news report, post, or tweet is not actually what the paper was saying, or that the paper is merely suggesting the possibility of something rather than "showing" something. For example, the paper might simply be providing some evidence for possible explanations of some long standing observation, or it might be presenting a highly speculative idea that has a low probability of actually turning out to be true. Some of my colleagues have expressed concern that the media may make stronger claims about the results than the original paper did.
The general public can be forgiven for not understanding that just because the media has highlighted an interesting idea with a low probability of being correct, this does not mean the idea is true. For example, I might get something from a friend saying "Scientists showed that our universe is the result of a collision within the multiverse! You must be very excited by this since the James Webb Telescope (JWT) will soon prove this for certain any day now!" I have to explain that this is just a very speculative idea by some cosmologists, and that it could be possible that some hint of whether this idea is true might be obtained in the future when the JWT has made long time observations of features in the universe; however, even if something is found in the large scale structure of the universe, there could be many other possible explanations that differ from the proposed collision with another universe. The response I get is, "Well, if this was so speculative, why did the media report on this or why was it tweeted everywhere? It must be very important and is probably soon going to be confirmed!" I reply that the media reports on it because it is cute and because people are more likely to like and remember this story than the announcement that some group just experimentally observed an unexpected sign reversal of the anomalous Hall effect for a multilayer doped compound at 30 Kelvin in a 2 Tesla magnetic field.
The other type of science stories that are also tweeted and highlighted in the media are those that seem to confirm a prevailing popular opinion or help enforce a political point of view. Some years ago a vegetarian friend told me, "Scientists have shown that eating meat makes people selfish and anti-social." That particular study was from Diederik Stapel, a Dutch scientist who performed a series of high profile studies that were published in top scientific journals and were regularly quoted by the media. The problem was that all of Stapel's work turned out to be fraudulent and the papers were retracted (Wikipedia, 2022). One could say that this simply demonstrates that "science works," meaning that fraudulent results get detected, which is true, but just recently the same friend brought this study up again to make a point even though I told him years ago that the study was fraudulent. In other words, the general public has now decided that science has shown that meat eaters are jerks, which we all kind of knew anyway, and this will be an accepted fact from now on, just like the fact that bumblebees cannot fly. Never mind that even if the study had not been fraudulent, it did not prove anything but merely provided some evidence for a particular interpretation of the data, and this does not even mention potential problems such as a possible lack of representative data sets (had the data actually existed instead of being fabricated) and so forth. For example, it could be that all of the data was taken in a town containing nothing but selfish meat eating people. This is of course the well known phenomenon of sample bias, and to prevent it, many studies would have to be performed repeatedly and shown to give the same result before such an interpretation could be considered to have a reasonable probability of being true. What Diederik Stapel did was figure out that if you say what many people want to hear, you will get more attention and more funding for your work. He of course resorted to out-and-out fraud, but one does not need to go that far; one can just fudge a bit or better yet simply fail to mention any other possible explanations of the data.
One can see why the general public is susceptible to these types of issues because they often do not really understand how science works. They think it is more like a light switch: performing the study is like flipping the switch, and either the light turns on or it does not turn on, so you have a clear cut result and you are done. They are not necessarily acting unreasonably because they are assuming that if something is being talked about on social or news media, it must not only be important but has been carefully vetted and is highly likely to be true.
The issue I am having is that I am noticing more and more working scientists engaging in the same behavior as the general public: they email or tweet that some new "studies show" or "prove" such and such a result, or they tweet out, "Our group just submitted a new preprint showing that something is happening for sure for this reason..." This is followed by dozens or even hundreds of retweets from other scientists saying, "amazing job," "super great work," "truly convincing," "no doubt a Nobel prize awaits you," or "double plus good work." Many of the retweets arrive only a handful of minutes after the original posting, so it is obvious that few retweeters actually read the paper. This is particularly odd because most scientists regularly review papers and attend talks. A paper review can be rather time consuming when done properly, and any scientist can tell you that they may have initially thought a paper was likely correct and novel, but that all sorts of issues arose once they actually looked deeper into the results or thought about the implications of what the authors were saying. Often one realizes that their first thoughts about the paper based only on reading the title or abstract did not at all match the actual results, or that the evidence supporting the authors' claims is rather tenuous or has multiple other possible causes. Until the paper has been fully read and understood, the scientist cannot be exactly sure what the authors are actually saying, the implications of the work, or how likely it is that the results are correct and the interpretation is valid. This is on top of assuming that the authors are being as honest as they can be. Due to the large volume of papers that practicing scientists must read, it is often necessary to trust that the authors are not hiding something or cutting corners. Many high level journals demand that papers not be circulated to the public before publication because the editors want multiple referees to carefully examine the paper to ensure that there are not any issues with the results or conclusions. There are preprint servers where people commonly post their papers prior to the conclusion of the review process. Used properly, these servers allow the authors to obtain feedback from a broader community before the paper is published. In my field, and probably in most other fields, scientists understand that papers that have only appeared on preprint servers do not carry the same weight as published papers. This is on top of the fact that even if the paper is published, generally many more studies or follow-ups must be performed to get a strong confirmation of the results and interpretation or to develop a full understanding of the phenomenon under consideration.
I would expect that scientists are already completely familiar with the points I have just raised; nevertheless, all the time I see scientists touting the results of a preprint or even a published paper that they have not read or understood, not due to the actual contents of the paper, but because they assume it supports one of their scientific or political beliefs since a media or social media announcement said "a fantastic new study shows."
This brings me to a recent example of this phenomenon and the dangers it entails. Recently a preprint was posted on the internet that was titled, "Decades of systemic racial disparities in funding rates at the National Science Foundation," C.Y. Chen et al. (2022). In this preprint, the authors analyze funding data from the National Science Foundation (NSF) and show that there is a persistent and longstanding gap between the rate at which proposals submitted by white Principal Investigators (PIs) are funded and the rate at which proposals submitted by PIs from other groups are funded. The relative funding success rate gap is over 20 percentage points between Asian and white PIs. In the abstract, the authors write, "The prevalence and persistence of these racialized funding disparities have cascading impacts that perpetuate a cumulative advantage to White PIs across all of science, technology, engineering, and mathematics." The authors then attribute the gaps in relative funding success rates to systemic racism, writing, "Systemic racism manifests at the NSF as higher funding rates for proposals by White PIs than those by non-White PIs, conferring a cumulative racial advantage."
The preprint was tweeted out by the authors along with the message, "The asks 1 Read the paper. 2 Send it to people. 3 Talk about it — here & elsewhere." It is indeed being talked about on various science blogs and reported in science news outlets such as Science magazine, but I have doubts about how often the preprint (which runs to 72 pages) was actually read or seriously discussed. One of the most stunning graphs in the preprint, Fig. 6, shows that over a two decade time frame, there is a cumulative advantage in proposal funding amounting to +12820 proposals for white PIs versus -9701 for Asian PIs, with the size of the gap increasing over time. A similar but much smaller effect appears for Black and Hispanic PIs (Chen et al. (2022)). Science magazine ran a news story on the preprint (Mervis (2022)) discussing the implications and stating, "The team gave a copy of its analysis to NSF leadership, which is not challenging its conclusions. NSF Director Sethuraman Panchanathan 'shares these concerns [about] systemic racial disparities in funding at NSF and other federal agencies,' an agency spokesperson says." Another news media report (Schwartz (2022)) stated, "The results, which NSF is not disputing, suggest that 'systemic racism manifests at the NSF as higher funding rates for proposals by White PIs than those by non-White PIs,' the study's preprint contends."
The amount of social media, tweets, and science media attention expended on this preprint shows that the preprint is indeed having an impact and that the issues it raises must be addressed; however, I have found no discussion of the actual results or other possible reasons for the disparities. Instead, all of the social media traffic consists of congratulations to the authors for clearly showing systemic racism, or statements that probably if extra factors were considered the racial disparity would become even worse.
When I first heard about the results in the preprint, I thought they sounded rather odd because I know that the fraction of Asian research faculty members being hired at universities has been increasing throughout the past two decades. Chen et al. (2022) even provide evidence of this in their Fig. S3, where they show in increase in the fraction of Asian PIs over time. The oddity arises because one thing about which everyone will agree is that a huge consideration in the hiring of research faculty at universities is the ability of a potential faculty member to obtain external grant money. Any economist will tell you that rational actors would avoid hiring faculty members from groups that have a strongly depressed probability of getting grant funding, and yet, among new research faculty hires, the fraction of Asians is growing while the fraction of whites is decreasing. I also found it odd that the funding rate gap is growing quite linearly as a function of time over a two decade window. Growth of this type is usually a sign that some other variable acting as a driving force must also be increasing or decreasing, and indicates that other variables need to be considered when trying to explain the data. I suppose that whites could be becoming even more racially biased over time, but this does not match my observations of the behavior of whites in society as a whole. I immediately thought of several alternative possibilities, different from systemic racism, that could be contributing to the funding gap, and read the paper carefully to see if the authors considered other possible explications apart from racial bias, but of course they did not. This is particularly relevant because the authors cannot measure systemic racism directly from the data; instead, they must infer that it is responsible for the gaps in funding success rates. The gaps the authors find are certainly real, but the authors have to make an effort to prove that a racial bias effect is the most likely cause before they make a claim of systemic racism.
There are in fact two major alternative explanatory mechanisms that both lead to a relative funding success rate gap between white PIs and Asian PIs and also between white PIs and PIs of other racial groups. Neither mechanism involves systemic racism. The first is related to the point I already mentioned: the fraction of Asian research faculty being hired has increased during the past few decades while the fraction of white research faculty being hired has declined. This demographic shift is reflected in an increase in the fraction of submissions from Asian PIs relative to submissions from white PIs (Fig. S3 of Chen et al. (2022)). This means that there must be an increasing fraction of early career Asian PIs and a decreasing fraction of early career white PIs. If career stage had no impact on funding success rate, this would not matter, but it is well known that established PIs have a funding success rate that is about 1.5 times as high as the funding success rate for early career PIs. This is simply because a PI who has successfully run a previously funded grant has essentially proven themselves and is much more likely to obtain a renewal of funding or a fresh line of funding in a related area. Early career PIs normally lack a track record of managing a research grant or may have gotten a single grant that was not managed well and was not renewed. Official statistics show that PIs who are within 7 years of receiving their PhD have lower odds of getting research funding than more senior PIs (Fig. S7 of Chen et al. (2022)). So what does this mean? It automatically implies that there must be a gap in the funding success rate whenever population A is growing while population B is shrinking, because there is an increase in the fraction of population A that are new PIs with a lower success rate, and an increase in the fraction of population B that are experienced PIs with a higher success rate. Since the fraction of Asian PIs is going up and the fraction of white PIs is going down, a larger fraction of Asian PIs are new submitters compared to white PIs, and therefore the Asian PIs aggregated as a whole will have a lower probability of funding success compared to the white PIs. The data in Fig. S7 of Chen et al. (2022) shows that Asian PIs have the largest increase in the fraction of submitted NSF proposals, rising from 11 percent to 23 percent or about a factor of two. There is also an increase in the fraction of proposals from Black and Hispanic PIs, but this increase is much smaller. At the same time, the fraction of submissions from white PIs has dropped from 78 percent down to 55 percent. Taken together, the implication is that white PIs should have a higher funding success rate than Black, Hispanic, and Asian PIs, but that the gap with the Asian PIs should be largest since the fraction of submissions from Asian PIs had the largest growth. In other words, this implies that the system contains a bias against younger PIs rather than a racial bias. Bias against funding younger PIs does not have have any nefarious implications; it simply indicates that in the competitive system of assistant professorships, not all young people will get tenure or be successful at establishing a productive research program.
The problem of experienced PIs having a higher probability of obtaining funding compared to new PIs is a well documented phenomenon that has been discussed at length for some time, since it impacts not only NSF but also other funding agencies. For example, NIH has experimented with several approaches such as a pot of funding specifically earmarked for new or early career PIs to help remedy this issue (Charette et al. (2015)). The key issue in the context of Chen et al. (2022) is that at least a portion of the funding success rate gap between white PIs and PIs from all other groups cannot be due to racial bias. A more simple way to describe the effect is that there are a proportionately larger fraction of older white PIs compared to older Asian PIs and a proportionately smaller fraction of younger white PIs compared to younger Asian PIs. Since more of the white PIs are in the submission pool that has a higher probability of funding success, the aggregated white PIs will have a higher funding success rate than the aggregated Asian PIs, even though any individual PI experiences a funding rate determined only by their own personal career stage. A test the authors of Chen et al. (2022) could perform, or at least suggest, would be to deconvolve the funding success rate for experienced Asian PIs and compare it to that of experienced white PIs to see if the gap in relative funding success rate changes.
This brings us to our next alternative explanatory mechanism. Proposals do not all go into a single pool at the NSF; instead, funding is given out by directorates that focus on particular topic areas. This avoids the problem of trying to determine the relative merit of an ocean study and a cosmology study. The funding success rates among the directorates are not uniform because the NSF only puts out a call for proposals in each area but does not control how many proposals are actually submitted in each area. Thus, in some areas the number of proposals might be much higher than the amount of funding available, while in other areas the number of proposals is only somewhat higher than the available funding. The engineering (ENG) and computer sciences (CISE) directorates have the lowest funding success rates of around 20%, while the biological science (BIO) and geoscience (GEO) directorates have the highest funding success rates of 30% percent or more (National Science Foundation (2022)). If Asian PIs are overrepresented in submissions to ENG and CISE but underrepresented in submissions to BIO and GEO, while at the same time white PIs are underrepresented in submissions to ENG and CISE but overrepresented in submissions to BIO and GEO, there will once again be a funding success rate gap favoring white PIs over Asian PIs. It turns out that the data quoted in Chen et al. (2022) shows that exactly this mechanism is occurring. In Fig. 4(d) of Chen et al. (2022), the authors show NSF funding data from the years around 2015 and plot the fraction of submissions to individual NSF directorates coming from different groups. Sure enough, Asian PIs are overrepresented by a factor of about 1.6 times in ENG and CISE and are underrepresented in GEO and BIO by a factor of two. Conversely, white PIs are overrepresented in GEO and BIO by a factor of 1.25 and underrepresented in CISE and ENG by a factor of 0.8. This means that once again, white PIs will have a higher funding success rate than Asian PIs simply because there are different population weights in the different directorates. As we saw earlier, this would not matter if all directorates had an equal probability of success, but in fact the directorates have quite different probabilities of success. This is in fact an example of what is called Simpson's paradox, well known in the statistics community, where improper aggregation of groups with different success rates into a single group can mask the true trends in the data (Bickel, Hammel, and O'Connell (1975)). Other factors could also come into play. For example, it could be that there is a higher growth in young Asian PIs in ENG and CISE, which would enhance the difference in funding success rates between Asian and white PIs.
To summarize what we have covered so far, the relative funding success rate gap between Asian PIs and white PIs found in Chen et al. (2022) could simply be the result of an increase in the fraction of young Asian PIs compared to young white PIs, combined with a difference in the submission rates of white and Asian PIs to different NSF directorates that have different success rates. Both effects produce a gap in funding success rates, and over time will produce a graph of cumulative advantages and disadvantages in funding awards with exactly the same shape as that found in Fig. 6 of Chen et al. (2002). The preprint authors briefly mention in passing that other factors could come into play, but failed to note that from the data contained in their own preprint, they could have made a quantitative estimate of the importance of these effects. In fact, both of the effects I have described can easily be shown not only to be occurring, but also to produce a relative funding success rate gap that quantitatively matches the gap shown in the preprint, indicating that virtually all of the gap can be explained by these two effects (Reichhardt and Reichhardt (2022)). Indeed, a back-of-the-envelope calculation requiring less than 10 minutes and using the data in Chen et al. (2022) immediately indicates that ignoring these effects is a serious oversight.
Here is an example of how to calculate the size of the funding success rate gap produced solely by demographic changes. First, we need the demographics. The NSF data quoted by Chen et al. (2022) shows that proposals from Asian PIs increased from about 14% to nearly 24% of the total number of submissions over two decades, corresponding to a factor of 1.7. At the same time, the fraction of proposals from white PIs increased from 78% to 50%, or a factor of close to 0.67. Therefore the ratio of the increase in Asian PIs to the decrease in white PIs is about 2.5. Next, we address the difference between new and experienced PIs. We need to know F_new, the fraction of each group that are new PIs, and F_prev, the fraction of each group that are experienced PIs. We also need to know S_new, the success rate for new PIs, and S_prev, the success rate for experienced PIs. Then we can obtain the aggregated success rate P for Asians, P(Asian)=F_new(Asian) x S_new + F_prev(Asian) x S_prev, and for whites, P(white)=F_new(white) x S_new + F_prev(white) x S_prev. It is immediately obvious that if S_new is smaller than S_prev and at the same time F_new(Asian) is larger than F_new(white), then P(white) must be larger than P(Asian).
Taking the actual numbers from NSF funding for 2016, found in Chen et al. (2022), we find that the success rate for new PIs is S_new=19%, and that for experienced PIs is S_prev=27%. The NSF data also shows that, aggregated across all groups, a total of 37% of the submitted proposals were from new PIs. According to the demographic change quoted above, Asian PIs are 2.5 times as likely as white PIs to be new PIs. This gives F_new(Asian)=64.5%, F_prev(Asian)=35.5%, F_new(white)=25.8%, and F_prev(white)=74.2%. Plugging in the numbers, we obtain P(white)=24.9% and P(Asian)=21.8%, a difference produced solely by the relative increase in hiring of Asian faculty members. Although it may seem natural to compare the modest difference between these success rates, Chen et al. (2022) perform their comparisons with the *relative* funding success rate P_rel, obtained by subtracting the average success rate P(average) from the success rate for a given group and then dividing by P(average). The overall average success rate for grants in 2016 was P(average)=24.1%. We obtain P_rel(white)=[P(white)-P(average)]/P(average)=3.3%, and P_rel(Asian)=[P(Asian)-P(average)]/P(average)=-9.9%. This corresponds to a relative funding success rate gap between white and Asian PIs of 12.8%, about half the size of the gap reported in Chen et al. (2022), indicating that changing demographics has a quite substantial effect. A similar calculation can be performed for Black and Hispanic PIs, but the growth in hiring of these two groups is much smaller than the growth in Asian hiring so the funding gap arising from demographics is correspondingly smaller. A more complicated but still straightforward calculation in Reichhardt and Reichhardt (2022) gives the funding success rate gap that can be attributed to differences in submission rates and funding success rates for different NSF directorates, and when this gap is combined with the gap calculated above, the relative funding success rate gap between white and Asian PIs is 26.9%, nearly the same as what is observed by Chen et al. (2022). Finally, we can copy Chen et al. (2022) and use the relative funding rate success rate gaps to compute the cumulative "surplus" and "shortfall" in proposals awarded to each group. We obtain the results shown in the Figure. Throughout our calculations there is no racial bias in the funding process; there are only differences in funding success rates for different directorates and different career stages, combined with differences in the fraction of each group that are in the different career stages or are submitting to the different directorates.
Of course, it is possible that after the effects just discussed are very carefully calculated and corrected for, there could still be a small amount of funding success rate gap that is not explained. There are, however, other possible explanations for such a residual gap besides systemic racism. For example, a portion of the gap could arise due to the fact that a significant fraction of the Asian PIs are foreign born, meaning that they must deal with a language barrier in writing the proposal that is not present for a native-born PI. I have no evidence that an ESL effect of this type arises, but there is evidence that as much as 80 percent or more of incoming Asian faculty members are foreign born, where the opposite is true of incoming white faculty members, so it is not unreasonable to think that a language barrier due to country of origin could come into play. If such an effect is relevant, it again represents an alternative to systemic racism and the authors should mention this. A test of the effect could be performed by comparing the funding success rate of only those Asian PIs who received their PhD in the United States to the success rate of white PIs that meet the same criterion. In fact, in Ginther et al. (2011) it was shown that a gap in National Institute of Health (NIH) funding success rates between Asian and white PIs was completely explained by taking nationality into account.
The argument that the funding success rate gap found in Chen et al. (2022) is due to racial bias or systemic racism is problematic because neither the bias nor the systemic racism are being measured directly; instead, their existence must be inferred by the presence of certain features in the data. Since the attribution of causality is only indirect, it is never possible to be certain that racial bias or systemic racism are really causing the observed effects, and it is therefore very important to try to rule out as many alternative explanations as possible as thoroughly as possible. I find it odd that the authors of Chen et al. (2022) did almost nothing like this, but instead seemed very happy to claim immediately that their data demonstrates that NSF is participating in systemic racism.
There are some other odd things about Chen et al. (2022). For example, the authors state that "a 2011 study showed that Black PIs were funded at roughly half the rate as White PIs (Ginther et al. (2011)). Subsequent analyses revealed additional inequalities across race (Hoppe et al. (2019); Erosheva et al. (2020); Lauer, Doyle, Wang, and Roychowdhury (2021); Ginther et al. (2018); Ginther, Kahn, and Schaffer (2016); Nikaj et al. (2018))." The issue is that Hoppe et al. (2019) actually argue against the existence of additional inequalities, and instead showed that 20 percent of the gap in funding success rates between Black and white PIs is explained by the differing choices of topics among the two groups. This is essentially the same mechanism we demonstrated in which Asian PIs disproportionately send their proposals to directorates with lower funding rates. In a follow-up study to their 2011 article, Ginther et al. (2018) show that much of the funding success gap between Black and white PIs could be explained by additional factors involving no racial bias. Ginther et al. (2018) write, "The applicant's publication history as reported in the NIH biographical sketch and the associated bibliometrics narrowed the black/white funding gap for new and experienced investigators in explanatory models. We found that black applicants reported fewer papers on their Biosketches, had fewer citations, and those that were reported appeared in journals with lower impact factors. Incorporating these measures in our models explained a substantial portion of the black/white funding gap." Thus, the initial results from the Ginther et al. (2011) study of NIH funding rates have been significantly modified by further studies, which indicate that evidence for racial bias is modest at best, yet Chen et al. (2022) do not see fit to mention this fact.
In fact, the study in Ginther et al. (2018) was specifically performed to address the concerns raised by the Ginther et al. (2011) study. As Ginther et al. (2018) explain, "In response [to the 2011 study], the NIH Director established a high-level Working Group on Diversity in the Biomedical Research Workforce (WGDBRW), and their report pointed out that potentially important explanatory variables were missing from the previous analysis. The report argued that the ability to distinguish between the competing explanations of the black/white NIH funding gap - application merit, investigator characteristics, or bias in the peer review process - was insufficiently explained by variables included in the analysis, prompting a need for a more detailed evaluation." So this whole thing has happened before: an initial study of proposal funding rates comes out, making a big stir in the news, but after further variables are considered, the results are substantially modified and in fact completely reversed. Again, you would think that Chen et al. (2022) would have pointed out this object lesson in the introduction or maybe thought that perhaps the same phenomenon could occur in their own work. Another point that Chen et al. (2022) failed to mention in their preprint is that even the Ginther et al. (2011) study did not find evidence of bias in NIH funding success rates between US born Asian PIs and white PIs. To have some hope of telling whether a difference really arises due to race, it is necessary to use a sample where other factors outside of race do not already give differing funding success rates, such as all new PIs submitting to ENG with the possible correction for English language skill mentioned above. It might also be necessary to compare PIs who are at institutes of the same type, such as only R1 universities. This is simply a rephrasing of the well-known fact that it is not very helpful to compare apples to oranges to peaches to watermelons; only an apples to apples comparison will do (Bickel, Hammel, and O'Connell (1975)). Along with a colleague I contacted Chen et al. to ask them about the points raised above and get their reaction to a preprint comment (Reichhardt and Reichhardt (2022)) that goes into greater detail, but have received absolutely no response. In contrast, when I forwarded the preprint comment (Reichhardt and Reichhardt (2022)) to Donna Ginther, the lead author of the 2011 and 2018 studies cited by Chen et al., she responded within a few hours saying that our arguments have "significant merit" and that Chen et al. failed to properly cite several important followup papers to studies of racial bias in funding. She also kindly directed our attention to some very recent work (Lauer et al. (2021)) showing that nearly all of the NIH funding success gap between Black and white PIs can be explained by funding heterogeneity mechanisms without invoking claims of racial bias. I find it a bit of a red flag that Chen et al. have not responded to comments on their work since, over the many years I have been in science, whenever I have contacted an author about their work I have always gotten a response in a fairly rapid fashion.
Chen et al. (2022) tweeted out their findings and even asked for feedback. This sounds good in principle but the retweets from other STEM scientists gave very little feedback but instead repeated how this work is amazing and important and clearly showing the effects of systemic racism. What is obvious is that these other scientists did not read the paper, put any thought into it, or consider possible alternative explanations or contributing factors. I also fault some of the science news media for reporting on this preprint without including any counterpoints or caveats noting that this is only a preprint that has not yet undergone peer review, and that other issues could be impacting the data or that other explanations might turn out to be correct. If Chen et al. (2022) had themselves actually considered other possible effects, the science news media reports might have mentioned that fact. The danger is that the more media coverage and tweets the preprint receives, the more likely it will just come to be accepted as true. In my email to Chen et al., in which I sent them a link to Reichhardt and Reichhardt (2022), I told them that they could post it on twitter to further the feedback they claimed they wanted, but they did not respond, nor did they tweet out the comment on their work.
Twitter may be an excellent platform for posting boasts, emojis, or insults, but it is a very poor medium for conveying complex information that can have subtleties of the type that often arise in science. Tweeting that "so and so just put out an amazing fantastic study proving that..." is not the same as a rigorous referee process, and scientists should also make it clear that any scientific study does not "prove" or "show" something; it can only provide some evidence or some argument for something. Now that the Chen et al. (2022) preprint is out on social media and being endlessly circulated in tweets, even if referees eventually point out the problems with the analysis and the results are substantially modified (or if the work is never published, or if a comment to the paper is published), the original finding of the preprint will be enshrined as an established truth from now on, right up there with bumblebees not being able to fly, and everyone will simply assume that NSF is facilitating systemic racism.
This underscores the danger of what I call "Twitter science" in which researchers can tweet very shoddy or misleading work just by highlighting a couple of sentences of the result, particularly a result that favors a particular agenda, while conveniently omitting any possible confounding issues or competing effects. When the results align with some political point of view, the tweet is likely to be received happily by followers who shower it with numerous adoring comments and retweet it to hundreds of people, few if any of whom will ever actually read the work. If some months or years later the paper ends up being rejected or modified in a way that undermines the original results, it is a sure bet that this fact will never be tweeted. In the end, the original tweet and claims will be commonly accepted as being true and the authors will be given lots of credit. This is a horrible way of doing science, but if it starts working for advancing the careers of scientists, we are sure to see more of it.
Another effect is what I call "Twitter asymmetry" where, even if someone brings up points or links to articles that contain counterpoints to the result, that person is blocked or attacked. I have even seen cases where a poster will say "I just read a paper by <so-and-so> and I am so angry because it says <my pet political project> is wrong. I know it is in bad faith; however, I will not even link the article because I do not want to bring attention to it and I urge others to ignore it," yet when I find said paper, it usually turns out to contain very reasonable scientific points. This tactic is an active method for preventing inconvenient information from circulating. Also, arguing that something is "in bad faith" does not address the points raised and is completely irrelevant. I would enjoy very much seeing this tactic used in the next Low Temperature Physics conference: "I am sorry, but I cannot address your point about how large the error is in our measure of the exponents near the superfluid transition because I believe this point is in bad faith." I am sure the audience would accept that as a valid response. As noted already, Chen et al. did not find it convenient to tweet out a link to Reichhardt and Reichhardt (2022) raising questions about their work.
Now since this is the Hetrodox STEM, this brings me to the additional issue that arises for studies of topics that have political implications. Some people will be afraid to speak up about possible scientific problems with these studies for fear of being attacked or canceled. This is completely counter to how the scientific method is supposed to work, but as we have seen in recent years, a number of professors who have spoken up have paid a price. Thus, even if someone identifies a problematic issue, they are motivated to keep their mouth shut. This leads to asymmetry in any criticism of these types of studies. Nature abhors a wasted asymmetry, so some scientists might start to take advantage of the criticism asymmetry and select politically charged problems to study, secure in the knowledge that any negative critique of their methodology or conclusions is a sign of moral failing by the person raising the criticism, making it unnecessary for them to need to defend their work on a scientific basis.
I am rather suspicious of the motives of many of these people who push for expanded studies in certain politically charged topics and promote works in these areas. Call me cynical, but in many cases I seriously doubt that these people actually care at all about possible socially relevant issues being addressed in a productive manner; instead, they are simply doing this to advance their own careers. A dead giveaway that the motives are not exactly pure is how fast these people launch an attack (often ad hominum) against anyone who points out an issue with their work, rather than seriously taking the points raised by the criticism into consideration. If they actually genuinely cared about the topic, they would approach a criticism in the same way that a scientist approaches any legitimate scientific problem: they would study the criticism very carefully to try to get all the relevant data and would look for the most effective ways to address the issues raised because they want the result to be as correct as possible. Indeed, legitimate working scientists normally want to gather as much information as possible on other points of view of the topic to help reach the truth. It is rare to see this approach applied for certain research topics; instead, the response to a criticism is insults, accusations, or calls for sanctions. I spent enough of my life in Southern California to know all the tell-tale signs of a hustler when I see one.
REFERENCES
Bickel, P. J., Hammel, E. A., and O'Connell, J. W. (1975) "Sex bias in graduate admissions: Data from Berkeley." Science 187, p398-404.
Charette, M. F. et al. (2015) "Shifting demographics among research project grant awardees at the National Heart, Lung, and Blood Institute (NHLBI)." PLoS ONE 11(12), e0168511.
Chen, C. Y., Kahanamoku, S. S., Tripati, A., Alegado, R. A., Morris, V. R., Andrade, K. and Hosbey, J. (2022) "Decades of systemic racial disparities in funding rates at the National Science Foundation." Available at https://osf.io/xb57u/
Erosheva, E. A. et al. (2020) "NIH peer review: Criterion scores completely account for racial disparities in overall impact scores." Sci. Adv. 6, eaaz4868.
Ginther, D. K. et al. (2011) "Race, ethnicity, and NIH research awards." Science 333, p.1015-1019.
Ginther, D. K., Kahn, S., and Schaffer, W. T. (2016) "Gender, race/ethnicity, and National Institutes of Health R01 research awards: is there evidence of a double bind for women of color?" Acad. Medicine 91, p.1098-1107.
Ginther, D. K. et al. (2018) "Publications as predictors of racial and ethnic differences in NIH research awards." PLoS ONE 13, e0205929.
Hoppe, T. A. et al. (2019) "Topic choice contributes to the lower rate of NIH awards to African-American/black scientists." Sci. Adv. 5, eaaw7238.
Lauer, M. S., Doyle, J., Wang, J., and Roychowdhury, D. (2021) "Associations of topic-specific peer review outcomes and institute and center award rates with funding disparities at the National Institutes of Health." eLife 10, e67173.
Mervis, J. (2022) "NSF grant decisions reflect systemic racism, study argues." Science 377, p455-456.
National Institutes of Health. (2012) "Draft report of the Advisory Committee to the Director, Working Group on Diversity in the Biomedical Research Workforce." Available at: http://acd.od.nih.gov/Diversity%20in%20the%20Biomedical%20Research%20Workforce%20Report.pdf [Accessed 15 Aug 2022].
National Science Foundation (2022). "NSF Merit Review Reports." [online] National Science Foundation. Available at: https://www.nsf.gov/nsb/publications/pubmeritreview.jsp [Accessed 15 Aug 2022].
Nikaj, S. et al. (2018). "Examining trends in the diversity of the U.S. National Institutes of Health participating and funded workforce." FASEB J. 32, p.6410-6422.
Reichhardt, C. and Reichhardt, C. J. O. (2022). "Comment on preprint 'Decades of systemic racial disparities in funding rates at the National Science Foundation." Available at https://osf.io/ykzvx/ [Accessed 15 Aug 2022].
Schwartz, D. (2022) "Study finds racial bias in NSF grant funding." [online] Tech Transfer Central. Available at: https://techtransfercentral.com/2022/08/02/study-finds-racial-bias-in-nsf-grant-funding/ [Accessed 15 Aug 2022].
Wikipedia Contributors (2022). "Diederik Stapel." [online] Wikipedia. Available at: https://en.wikipedia.org/wiki/Diederik_Stapel [Accessed 15 Aug 2022].
BRIEF BIO
Charles Reichhardt, Fellow of the American Physical Society, is a technical staff member in the Condensed Matter and Complex Systems group of the Theoretical Division at Los Alamos National Laboratory.
Excellent and very detailed article Charles, I hope that a large number of scientists would read this and be aware of these arguments before claiming all differences to racism or sexism. I agree that the evidence points to a desire to advance their own careers instead of really understanding the truth. I loved your article, what careful analysis. Thank you
“Some people will be afraid to speak up about possible scientific problems with these studies for fear of being attacked or canceled. This is completely counter to how the scientific method is supposed to work, but as we have seen in recent years, a number of professors who have spoken up have paid a price. Thus, even if someone identifies a problematic issue, they are motivated to keep their mouth shut.”
Woke Identity Politics is all dogma and no nuance. It’s basically a secular version of the Medieval Roman Catholic church with its own original sin, heretics, saints, orthodoxy, excommunication, blasphemy, etc.