Distance Measures for Geometric Graphs

Sushovan Majhi
Tulane University, 2023

Today’s Agenda

  • Motivation
  • What is a Geometric Graph?
  • Why defining a similarity measure for them is challenging?
  • The two friends: GED and GGD
  • How do they compare?
  • Computational Complexity of GGD
  • Graph Mover’s distance (GMD)
  • Empirical results
  • Discussions

Motivation

  • Graph representation facilitates easy desciption of relational patterns
  • Pattern Recognition becomes the problem of measuring similarity between two graphs
  • A distance measure is deemed meaningful if
    • small distance implies similarity
    • large distance reveals disparity
  • Graph comaprison has been widely studied both by the Pattern Recognition and Database community.
    • Geometric graphs are new!

Geometric Graphs \mathbb{R}^d

A combinatorial graph G=(V,E) is called geometric if

  • V\subset\mathbb{R}^d
  • An edge (u,v)\in E is identified with the line segment \overline{uv}
  • edges intersect only at their endpoints

Applications

Letter Recognition

Map Comaprison

mapconstruction.org (Ahmed et al. since 2013)

How to Define Distance between Two Graphs?

Given two geometric graphs G, H:

  • check if they are (combinatorial) isomorphic
    • not robust
    • not lenient to local, minor defects
  • correct small defects, but at a small cost

Geometric Edit Operations

  • deletion (or insertion) of a vertex does not cost
  • deletion (or insertion) of an edge costs C_E times its length
  • translation of a vertex costs C_V times the displacment plus C_E times the total length changes in the incident edges.

An Example Edit Path (P)

C_E|u_1u_2|

C_V|v_3-u_2|
+C_E||u_3v_3|-|u_2u_3||

C_V|u_3-v_2|
+C_E||v_2v_3|-|u_3v_3||

Cost (P):= the total cost of the individual edits

Geometric Edit Distance (GED)

GED(G,H):=\inf cost(P), where the infimum is taken over all edit paths from G to H.

Pros

  • the notion is very intuitive
  • GED is a metric for positive contants C_V, C_E.

Cons

  • not computationally feasible
    • infinitely many edit paths

Geometric Graph Distance (GGD)

  • choose isomorhpic subgraphs G',H' of G,H, respectively
  • delete the rest, and pay for it
  • finally, morph G' onto H' by translation, and pay for it

Geometric Graph Distance (GGD)

GGD(G,H):=\min_{\text{matching }\pi}cost(\pi).

Relation:

GGD(G,H)\leq GED(G,H)\leq\left(1+\Delta\frac{C_E}{C_V}\right)GGD(G,H), where \Delta is the maximum degree of the two graphs.

  • The inequalities are, in fact, tight.

Properties of GGD

  • GGD is a metric
  • computationally feasible
  • NP-hard (Majhi and Wenk 2023) even if
    • planar, and
    • C_V,C_E arbitrary
  • No PTAS known

Earth Mover’s Distance (EMD)

  • Our graph mover’s distance is based on the idea of the classic Earth Mover’s Distance
  • Originally developed as a similarity measure between distributions
  • Application to images (Rubner, Tomasi, and Guibas 2000)
  • Can be efficiently solved as a transportation problem on a bipartite graph

Formulation of the EMD

  • Weighted point sets P=\{(p_i,w_{p_i})\}_{i=1}^m and Q=\{(q_j,w_{q_j})\}_{j=1}^n
  • Ground distances [d_{i,j}], cost of transporting a unit product from p_i to q_j
  • Feasibility: \sum w_{p_i}=\sum w_{q_j}
  • A flow of supply is given by a matrix [f_{i,j}], entries denoting the units of supply from p_i to q_j
  • Minimize the cost: \sum_{i=1}^m\sum_{j=1}^n f_{i,j}d_{i,j}
  • Subject to:
    • f_{i,j}\geq0
    • \sum_{j=1}^n f_{i,j} = w_i
    • \sum_{i=1}^m f_{i,j} = w_j
  • The EMD is defined as the cost of the optimal flow.

Graph Mover’s Distance (GMD)

  • GMD can be computed in O(n^3)-time for geometric graphs with at most n vertices

Metric Properties of the GMD

  • GMD is a pseudo-metric
  • GMD(G,H)=0 does not always imply that G=H as geometric graphs

Experiments

  • LETTER dataset from IAM (Riesen and Bunke 2008)
  • 15 prototype drawings for letters (A, E, F, H, I, K, L, M, N, T, V, W, X, Y, and Z) from the English alphabet
  • three levels (LOW, MED, and HIGH) distortions are applied
  • On each distortion level, 2250 letter drawings from the prototypes

Empirical Results

For each test graph

  • compute the GMD to the 15 prototypes
  • sort the prototypes in the increasing order of its GMD
  • check if the label is present in the first k prototypes

Discussions

  • How to retain all the metric properties for the GMD?
  • Can the GMD be useful in supervised learning for classification?
  • Currently, NONE of the graph distances (GED, GGD) is stable with respect to GMD. Can we develop a stable graph distance?
  • Is it possible to approximate the GGD?

References

Ahmed, M., S. Karagiorgou, D. Pfoser, and C. Wenk. since 2013. “Map Construction Portal.” mapconstruction.org.
Majhi, Sushovan. 2023. “Graph Mover’s Distance: An Efficiently Computable Distance Measure for Geometric Graphs.” In Proceedings of the 34th Canadian Conference on Computational Geometry, CCCG 2022, Concordia University, Montreal, QC, Canada, July 31-August 2.
Majhi, Sushovan, and Carola Wenk. 2023. “Distance Measures for Geometric Graphs.” Computational Geometry, 102056. https://doi.org/https://doi.org/10.1016/j.comgeo.2023.102056.
Riesen, Kaspar, and Horst Bunke. 2008. IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning.” In Structural, Syntactic, and Statistical Pattern Recognition, 5342:287–97. Springer. https://doi.org/10.1007/978-3-540-89689-0_33.
Rubner, Yossi, Carlo Tomasi, and Leonidas J. Guibas. 2000. “The Earth Mover’s Distance as a Metric for Image Retrieval.” International Journal of Computer Vision 40 (2): 99–121. https://doi.org/10.1023/A:1026543900054.