Skip to main content

Science has an AI problem. This group says they can fix it.

Networking and communication questions for cyber security.Cyborg head and binary code.3d illustration. carloscastilla/iStock

Published Date

Article Content

AI holds the potential to help doctors find early markers of disease and accelerate research on other important scientific advances. But a growing body of evidence has revealed deep flaws in how machine learning is used in science, a problem that has swept through dozens of fields and implicated thousands of erroneous papers.

Now an interdisciplinary team of 19 researchers that includes Marta Serra-Garcia of the University of California San Diego’s Rady School of Management has published guidelines for the responsible use of machine learning in science.

“When we graduate from traditional statistical methods to machine learning methods, there are a vastly greater number of ways to shoot oneself in the foot,” said Arvind Narayanan, director of Princeton University’s  Center for Information Technology Policy, who led the research team along with Princeton computer scientist Sayash Kapoor. “If we don’t have an intervention to improve our scientific standards and reporting standards when it comes to machine learning-based science, we risk not just one discipline but many different scientific disciplines rediscovering these crises one after another.”

Because machine learning methods are new and used by many different disciplines, it is important to develop guidelines that can ensure the credibility of these methods as their use expands. A paper detailing their guidelines was recently published in the journal Science Advances.

“Many researchers are concerned with the emerging reproducibility crisis in the use of these methods, which could be as serious as the replication crisis that emerged in social psychology more than a decade ago,” Serra-Garcia said, an associate professor of economics and strategy at the Rady School.

The good news is that a simple set of best practices can help resolve this newer crisis before it gets out of hand, according to the authors, who come from computer science, mathematics, social science and health research.

“This is a systematic problem with systematic solutions,” said Kapoor, a graduate student who works with Narayanan and who organized the effort to produce the new consensus-based checklist.

The checklist focuses on ensuring the integrity of research that uses machine learning. Science depends on the ability to independently reproduce results and validate claims. Otherwise, new work cannot be reliably built atop old work, and the entire enterprise collapses. While other researchers have developed checklists that apply to discipline-specific problems, notably in medicine, the new guidelines start with the underlying methods and apply them to any quantitative discipline.

Marta Serra-Garcia Associate Professor of Economics and Strategy

Marta Serra-Garcia Associate Professor of Economics and Strategy

One of the main takeaways is transparency. The checklist calls on researchers to provide detailed descriptions of each machine learning model, including the code, the data used to train and test the model, the hardware specifications used to produce the results, the experimental design, the project’s goals and any limitations of the study’s findings. The standards are flexible enough to accommodate a wide range of nuance, including private datasets and complex hardware configurations, according to the authors.

While the increased rigor of these new standards might slow the publication of any given study, the authors believe wide adoption of these standards would increase the overall rate of discovery and innovation, potentially by a significant amount.

“What we ultimately care about is the pace of scientific progress,” said sociologist Emily Cantrell, one of the lead authors, who is pursuing her Ph.D. at Princeton. “By making sure the papers that get published are of high quality and that they’re a solid base for future papers to build on, that potentially then speeds up the pace of scientific progress. Focusing on scientific progress itself and not just getting papers out the door is really where our emphasis should be.”

Kapoor concurred. The errors hurt. “At the collective level, it’s just a major time sink,” he said. That time costs money. And that money, once wasted, could have catastrophic downstream effects, limiting the kinds of science that attract funding and investment, tanking ventures that are inadvertently built on faulty science and discouraging countless numbers of young researchers.

In working toward a consensus about what should be included in the guidelines, the authors said they aimed to strike a balance: simple enough to be widely adopted, comprehensive enough to catch as many common mistakes as possible.

They say researchers could adopt the standards to improve their own work; peer reviewers could use the checklist to assess papers; and journals could adopt the standards as a requirement for publication.

“The scientific literature, especially in applied machine learning research, is full of avoidable errors,” Narayanan said. “And we want to help people. We want to keep honest people honest.”

The paper, “Consensus-based recommendations for machine-learning-based science,” published on May 1 in Science Advances, included the following authors: Sayash Kapoor, Princeton University; Emily Cantrell, Princeton University; Kenny Peng, Cornell University; Thanh Hien (Hien) Pham, Princeton University; Christopher A. Bail, Duke University; Odd Erik Gundersen, Norwegian University of Science and Technology; Jake M. Hofman, Microsoft Research; Jessica Hullman, Northwestern University; Michael A. Lones, Heriot-Watt University; Momin M. Malik, Center for Digital Health, Mayo Clinic; Priyanka Nanayakkara, Northwestern; Russell A. Poldrack, Stanford University; Inioluwa Deborah Raji, University of California-Berkeley; Michael Roberts, University of Cambridge; Matthew J. Salganik, Princeton University; Marta Serra-Garcia, University of California-San Diego; Brandon M. Stewart, Princeton University; Gilles Vandewiele, Ghent University; and Arvind Narayanan, Princeton University.

Adapted from a Princeton University release

Learn more about research and education at UC San Diego in: Artificial Intelligence

Share This:

Category navigation with Social links