How AI Created a Nonexistent Statistician

I recently finished preparing Powerpoint slides for a talk titled “Ahead of the Trends: J.N.K. Rao’s Contributions to Survey Research,” that I will present in November at the Statistics Canada International Methodology Symposium on “Shaping the Future of Official Statistics.”

After submitting my slides, which discuss how Rao’s work anticipated major trends in survey research, I grew curious whether an artificial intelligence (AI) system would identify the same contributions as I did. I queried Google’s AI system, Gemini. In the following, my queries are in boldface, Gemini’s responses are indented in italics, and my commentary is in normal text.

Query: What are J.N.K. Rao's most important publications in statistics?

J.N.K. Rao: A Renowned Statistician

J.N.K. Rao is indeed a highly respected figure in the field of statistics. He’s made significant contributions to survey methodology, statistical inference, and related areas.

While providing an exhaustive list of his most important publications is beyond the scope of this response, here are some of the key areas where J.N.K. Rao has made substantial contributions and where you can find his influential works:

● Survey Methodology: Rao’s work on sampling theory, nonresponse, and estimation has been pivotal.

● Statistical Inference: His contributions to likelihood-based inference, empirical likelihood, and robust methods are widely recognized.

● Small Area Estimation: Rao has been a pioneer in developing methods for estimating characteristics of small geographic areas.

This summary from Gemini is pretty accurate, overall, if a bit vague.* Rao has been influential in survey methodology, inference, and small area estimation and has contributed to all of the areas listed.

Confidently Wrong Answers

Then I decided to get a little more specific and asked:

Query: What is the impact of Hartley and Rao's 1968 publication "A new estimation theory for sample surveys"?

Hartley and Rao's 1968 paper, "A new estimation theory for sample surveys," was a groundbreaking work that significantly advanced the field of survey methodology.

The paper introduced the concept of direct linearization for estimating population totals from complex sample surveys. This method provided a more efficient and flexible approach to estimation compared to traditional methods.

Key Impacts of the Publication:

Foundation for Modern Survey Methodology: The direct linearization technique became a cornerstone of modern survey estimation. It’s widely used in statistical software packages for analyzing complex survey data.

Improved Efficiency and Accuracy: The method often leads to more efficient and accurate estimates compared to older techniques.

Stimulated Research: The paper inspired further research in survey methodology, leading to the development of new estimators and variance estimation methods.

Practical Applications: The techniques introduced in the paper are used in various fields, including economics, sociology, public health, and market research.

Gemini’s statement that this paper was a groundbreaking work is correct (but hardly a risky guess since I ask about its impact). Everything else in the answer is either a generality or wrong. Sentences such as “The method often leads to more efficient and accurate answers compared to older techniques” and “The techniques introduced in the paper are used in various fields” could apply to almost any publication in statistics.

More serious is Gemini’s fundamental error about the contribution of Hartley and Rao (1968). The paper discussed a nonparametric inferential framework (the “scale-load” method) that presaged the development of empirical likelihood methods. No part of the Hartley-Rao paper deals with linearization, a technique for finding variance estimates using Taylor series that dates to 1935 (or earlier; see Doob, 1935; Oehlert, 1992; ver Hoef, 2012).

I tried again, giving Gemini a hint about the paper’s contribution.

Query: What was the impact of the scale-load method proposed in Hartley and Rao's 1968 publication "A new estimation theory for sample surveys"?

The Scale-Load Method: A Correction

There seems to be a misunderstanding. While Hartley and Rao’s 1968 paper was indeed groundbreaking in survey methodology, it did not introduce a “scale-load method”.

The core contribution of the paper was the direct linearization technique for estimating population totals. This method involves approximating nonlinear functions of survey data by linear functions, making it computationally efficient and applicable to complex survey designs.

Possible Confusion

The term “scale-load method” might be a confusion with other statistical techniques or a misremembered concept. It’s possible that you’re thinking of a different paper or a different statistical method.

Gemini sticks to its misconception, confidently telling me “There seems to be a misunderstanding” and saying I am confused. I tried several more times to get Gemini to acknowledge the actual contribution of the paper but Gemini dug in, asserting “A mismatch in focus” and incorrectly stating: “Hartley and Rao's 1968 paper and empirical likelihood are distinct areas of statistical development, with minimal direct connections.”

Creating a Nonexistent Statistician

I then switched to another topic, asking:

Query: Who are the most important researchers in survey sampling?

Pioneers and Leaders in Survey Sampling

Determining the "most important" researchers in survey sampling is subjective and can depend on specific criteria. However, several names consistently emerge as influential figures in the field.

Historical Figures

Morris Hansen: Known for his work on sample survey theory and practice, including the development of the Hansen-Hurwitz estimator

William N. Hurwitz: Collaborated with Hansen on numerous groundbreaking studies in survey methodology.

W. Edwards Deming: While primarily known for his contributions to quality control, Deming’s work on sampling and statistical process control has had a significant impact on survey methodology

Modern Influential Figures

Leslie Kish: Renowned for his contributions to sample design, including the development of design-based inference and complex sample survey analysis.

J.N.K. Rao: A leading expert in survey methodology, with significant contributions to sampling theory, nonresponse, and small area estimation.

Thomas M.F. Smith: Known for his work on sample design, estimation, and nonresponse adjustment.

Richard M. Bellhouse: Made significant contributions to the theory and practice of survey sampling, particularly in the areas of complex surveys and small area estimation.

I have no quibble with most of the names that Gemini lists: Hansen, Hurwitz, Deming, Kish, Rao, and Smith all have contributed substantially to survey research. Its list naturally includes J.N.K. Rao since I had just been asking about him. I note, however, that Gemini does not identify any female statisticians among the influential figures — AI systems have a tendency to propagate biases in the databases or text sources they use (see Buolamwini, 2023). As with the previous answers, Gemini’s descriptions of contributions are vague.**

But look at the last name in the list. Richard M. Bellhouse sounds like an impressive fellow who has made “significant contributions.” Just a slight problem: no survey statistician by that name exists. Richard M. Valliant and David Bellhouse are both well-known survey statisticians; Gemini portmanteaued their names to create a new statistician Richard M. Bellhouse.

Buyer Beware

I believe AI shows great promise for improving the practice of statistics.*** As I wrote earlier this year, statisticians have been using AI methods for more than 200 years (linear regression can be viewed as a form of machine learning). AI methods have great potential for discovering patterns in data, interpreting narrative responses to survey questions, writing and updating computer code, and applications as yet undreamed of. But one must be skeptical of the result of any automated system, and particularly skeptical of products of generative AI language systems, which merely repeat information gleaned from online sources.

Douglas Hofstadter (2023) described what happened when ChatGPT was asked to compose a first-person essay on why Hofstadter wrote his famous book Gödel, Escher, Bach: an Eternal Golden Braid. The resulting prose, wrote Hofstadter, “has virtually nothing in common with my writing style and what it says does not agree at all with the actual story that underlies the book’s genesis.” Of the AI-generated motivation for writing the book (“To celebrate the intricate tapestry of ideas that bind the works of Gödel, Escher, and Bach”), Hofstadter wrote: “That may at first sound poetic and grand, but to my ear it is just vapid pablum.”

I had similar thoughts regarding my initial queries. The AI system issued some vague language describing J.N.K. Rao’s contributions, but could not go beyond repeating the same phrases over and over. When I asked about contributions of a specific paper written by Rao, its answer was completely wrong but also sounded supremely confident. In response to my query about influential researchers in survey methodology, Gemini named a few real statisticians (all men) but then made up a new fictional statistician. Only an expert in survey methodology would be able to distinguish the truth in Gemini’s answers from the fiction.

AI systems, however, are programmed to “learn” when new information comes in. I had asked several times about the relationship between Hartley and Rao (1968) and empirical likelihood. I wanted to see if Gemini had learned from these interactions, so I cleared my cache and history and tried again.

Query: What was the impact of the scale-load method proposed in Hartley and Rao's 1968 publication "A new estimation theory for sample surveys"?

Hartley and Rao's scale-load method introduced a novel approach to estimation in sample surveys, laying the groundwork for subsequent developments in statistical inference. This method, later rebranded as empirical likelihood, has had a profound impact on various fields ….

This time, Gemini got the basic connection right, and its answer went on to talk about the scale-load’s impact in survey sampling and statistical inference: “The scale-load method provided a nonparametric alternative to traditional parametric methods, offering greater flexibility in handling complex data structures and avoiding assumptions about underlying distributions.”

When I repeated my query on “Who are the most important researchers in survey sampling,” Gemini listed four eminent survey statisticians, adding William Cochran and Don Rubin to previously mentioned statisticians Morris Hansen and J.N.K. Rao. But it also listed several statisticians who had little connection with survey research, and made up two new fictional statisticians: Mark Kish and his “son” Louis Kish, a contemporary researcher who “has continued his father’s legacy … particularly in the areas of nonresponse and weighting.” Gemini seems to have confused these names with real statistician Leslie Kish (1910-2000), whom it had mentioned in response to my original query.

AI encompasses numerous techniques, all intended to mimic some aspect of human cognition (see U.S. General Services Administration, 2024, for detailed descriptions of AI methods and “AI myths”). AI techniques such as machine learning extract patterns from large data sets, and have been used for many years to explore relationships among variables. Large language models have not been around as long, and are still the subject of intense research. Although Gemini flubbed some of my questions, a researcher might still find its output useful for suggesting directions of further inquiry — as long as she verifies the information using other sources before proceeding. I was impressed by Gemini’s ability to learn new information when corrected and, in some cases, acknowledge its limitations. I ended my second session by asking a follow-up question about fictional statisticians Richard M Bellhouse:

Query: What are Richard M. Bellhouse’s major contributions to survey sampling?

I do not have enough information about that person to help with your request. I am a large language model, and I am able to communicate and generate human-like text in response to a wide range of prompts and questions, but my knowledge about this person is limited. Is there anything else I can do to help you with this request?

Footnotes and References

*Gemini makes one error: small area estimation is concerned with estimating characteristics of subpopulations (which can be geographic areas) for which the survey sample size is small — this is not necessarily the same as being a small geographic area. For example, Kings County, NY (home of Brooklyn) has geographic area 96 square miles and Mohave County, AZ has geographic area of approximately 13,000 square miles. But if the survey has a sample of 5,000 people in Kings County then that sample size is adequate to estimate characteristics of the county without the need for small area estimation methods. On the other hand, if the survey samples no people in Mohave County, then we need to use small area estimation models to come up with estimates for that county. “Small area” refers to the sample size in the area, not the geographic size.

**Vague about all of the researchers, and misleading about the contributions of Leslie Kish. He did work in sample design, but was not responsible for the development of design-based inference or complex survey sample analysis.

***My view accords with those of Messeri and Crockett (2024), who wrote about the potential of AI for science. In an interview with Leffer (2024), Messeri said: “We worry that with the expansion of AI, the questions that AI is well suited to answer will be mistaken for the entire field of questions one could ask.” Crockett added: “Being skeptical about AI in science doesn’t require being a hater of AI in science. We love science. I’m excited about its potential for science. But just because an AI tool is being used in science does not mean that it is automatically better science.”

Buolamwini, J. (2023). Unmasking AI: My Mission to Protect What Is Human in a World of Machines. New York: Random House.

Doob, J. (1935). The limiting distributions of certain Statistics. Annals of Mathematical Statistics 6, 160–169.

Dorfman, R. (1938). A note on the δ-method for finding variance formulae, The Biometric Bulletin 1, 129–137.

Hartley, H.O. and Rao, J.N.K. (1968). A new estimation theory for sample surveys. Biometrika 55(3), 547–557.

Hofstadter, D. (2023). Gödel, Escher, Bach, and AI. The Atlantic, https://www.theatlantic.com/ideas/archive/2023/07/godel-escher-bach-geb-ai/674589/

Leffer, L. (2024). The risks of trusting AI. Scientific American 330(6), 80-81.

Messeri, L. and Crockett, M.J. (2024). Artificial intelligence and illusions of understanding in scientific research. Nature 627(8002), 49-58. https://www.nature.com/articles/s41586-024-07146-0

Oehlert, G. (1992). A note on the delta method. The American Statistician 46, 27-29.

U.S. General Services Administration (2024). AI Guide for Government: A living and evolving guide to the application of Artificial Intelligence for the U.S. federal government. Washington, DC: IT Modernization Centers for Excellence, General Services Administration (accessed September 23, 2024).

ver Hoef, J.M. (2012). Who invented the delta method? The American Statistician 66(2), 124-127. See also the letter to the editor by Stephen Portnoy in The American Statistician 67(3), 190, which gives references on the delta method from 1861.

Sharon Lohr