Skip to main content

New Mathematical Framework Better Illustrates Complex Data Patterns

Mathematicians with UC San Diego’s Halıcıoğlu Data Science Institute (HDSI) have unveiled a novel approach that could transform how scientists analyze and understand complex data sets across multiple fields. Image by Envato Elements.
Mathematicians with UC San Diego’s Halıcıoğlu Data Science Institute (HDSI) have unveiled a novel approach that could transform how scientists analyze and understand complex data sets across multiple fields. Image by Envato Elements.

Published Date

Article Content

Mathematicians with UC San Diego’s Halıcıoğlu Data Science Institute (HDSI) – part of the School of Computing, Information and Data Sciences – have unveiled a novel approach to hierarchical clustering – a theoretical framework that could transform how scientists analyze and understand complex data sets across multiple fields.

The new research, published in the Journal of Machine Learning Research, takes an innovative “axiomatic approach” to defining how populations can be grouped in meaningful ways when data follows specific patterns.

“What makes our work significant is how it bridges classical mathematical theory with modern data science needs,” explained Ery Arias-Castro, co-author of the study, HDSI affiliate and professor in the UC San Diego Department of Mathematics. “We have essentially created a mathematical foundation that works reliably across different types of data distributions.”

At the heart of the work is a technique inspired by Lebesgue integration — a fundamental concept in advanced calculus — which the researchers have adapted to work with various data density patterns. The approach encompasses a simple setting to start with, and then builds up to more complicated patterns that real-world measurements might follow.

Elizabeth (Lizzy) Coda, co-author of the study and a graduate student with the UC San Diego mathematics department explained the work with a real-world analogy. “Imagine trying to find meaningful groups in a city's population based on where people live,” Coda said. “Traditional methods might struggle when population density varies dramatically between downtown high-rises and suburban neighborhoods, but our new framework accounts for these variations automatically.”

Arias-Castro and Coda’s research confirms that when certain conditions are met — such as when the underlying density forms connected patterns and changes smoothly — the approach aligns perfectly with what's known as Hartigan's definition of cluster trees, a respected standard in the field.

“The approach we take is at the population level, rather than the data level,” Arias-Castro said. “The idea is that the theoretical definition is something that gives meaning to statistical inference, and to questions such as whether an observed cluster is ‘real’ or not.”

Share This:

Category navigation with Social links