Two halves of the mathematical foundations of data science, with topology as the lens. On the theoretical side, theorems: when a finite sample remembers the manifold it was drawn from, when a Vietoris–Rips complex is homotopy-equivalent to the ground truth, when the Gromov–Hausdorff distance between two metric spaces is even computable. On the applied side, the disreputable data—finance, climate, biology, fluid mechanics—that wanted to know in the first place. The two halves keep each other honest, and, on bad days, mutually embarrassed.
The recurring scaffolding of a result is austere: under sampling condition X and parameter range Y, the constructed complex is homotopy-equivalent (or quasi-isometric, or close in the Gromov–Hausdorff distance) to the ground truth. The work is in finding X and Y at once weak enough to hold in practice and strong enough to be useful. The proofs, when they come, tend to be apparently simple in retrospect—which is a polite way of saying they took a long time.
What is constant across the work: a discomfort with descriptors that classify well but explain nothing, and a corresponding insistence that any pipeline used in earnest should come with a written guarantee of when it can be trusted, and when it cannot.
On the theoretical side
Provable methods for shape, graph, and manifold reconstruction. The objects of study are simplicial complexes built from finite samples—Vietoris–Rips, Čech, alpha—and the questions, as the frontispiece insists, are about when and how faithfully such complexes recover the topology and geometry of an unknown ground truth. The tools are inherited from algebraic topology, metric geometry, and computational geometry; all three considerably older than the data they have lately been asked to analyse.
The current open questions in the programme: closed-form bounds for adaptive-landmark constructions; stability of the Euler characteristic surface under noise; the Gromov–Hausdorff distance, in cases where it can be computed at all, between point clouds drawn from manifolds of different intrinsic dimension.
On the applied side
Where the data is unrepentantly high-dimensional but suspected, often correctly, of living on something simpler. Recent collaborations have hunted monsoon onsets, the topology of the polar vortex, two-phase flow regimes, and the moods of the stock market.
A working assumption: every applied problem starts with a domain expert pointing at a dataset and asking, is there structure in here? The topologist’s contribution is to make that question precise enough to admit a theorem, and the descriptor (a persistence diagram, an Euler characteristic surface, a Reeb graph) general enough to be plugged into a downstream classifier without further apology.
Active projects
Seven projects, each a corner of the broader programme. Click a card for collaborators, papers, and the longer description.