Community detection on geometric graphs

A local-to-global perspective

B. R. Vinay Kumar

NETWORKS Training week
April 9, 2025
Kaap Doorn, Netherlands

Geometric graphs

Wireless networks

Social networks

Data science

Geometric graphs? Vertices are embedded in a metric space and edges depend on the distance between nodes.
Presence of short edges and abundance of triangles.

Questions?

How does the network structure affect processes or information on the network?
Can local algorithms help to solve a global problem?
Does geometry help in solving a global problem efficiently?

Community detection

Partition the vertex set of a graph into subsets meaningfully.

Group of nodes that are more connected to one another than to the rest of the network.

Two approaches

Model-based methods: Build a model from data and design algorithms for the model.
Model-free methods: Modularity based methods, Louvain algorithm etc.

Community detection

Co-authorship networks

Community detection

Interactions depend on:

Node communities
Distance between nodes
Vertex weights

Co-authorship networks

Model

Torus $ S=\left[\frac{-1}{2},\frac{1}{2}\right]^d$

$\mathbf{A}$ $\sim GKBM(\lambda,n,d,\kappa_{\text{in}},\kappa_{\text{out}})$

Poisson point process $\mathbb{X}$$= \{X_u\}_{u=1}^N$ of intensity $\lambda n$

Sample $N \sim \text{Poi}(\lambda n)$.
Choose $X_1,X_2,\cdots,X_N$ uniformly from $S$.

Weights: $\mathbf{W}$=$\big(W_1,W_2,\cdots, W_N\big), \ \ W_i\in \mathcal{W} \sim F_W(\cdot)$

Two communities: $ \sigma$$ = \big(\sigma(1),\cdots,\sigma(N)\big)$ \[\mathbf{P}(\sigma(u)=\color{#f86d6d}{+1})=\mathbf{P}(\sigma(u)=\color{#ffde20}{-1}) = \frac{1}{2}\]
Geometric kernels: $\kappa_{\text{in}},\kappa_{\text{out}}: \mathbb{R}^+\times\mathcal{W}\times \mathcal{W}\to [0,1]$
$\alpha_{n,d} = \big(n/\log n\big)^{1/d}$

Given locations $\mathbb{X}$, weights $\mathbf{W}$ and communities $\sigma$

\[A_{uv}=1 \text{ w.p. }\begin{cases} \kappa_{\text{in}}\big(\alpha_{n,d}\|X_u-X_v\|,W_u,W_v\big) & \text{if } \sigma(u)=\sigma(v)\\ \kappa_{\text{out}}\big(\alpha_{n,d}\|X_u-X_v\|,W_u,W_v\big) & \text{if } \sigma(u) \neq \sigma(v) \end{cases}\]

Problem Formulation

$\mathbf{A}$ $\sim GKBM(\lambda,n,d,\kappa_{\text{in}},\kappa_{\text{out}})$

Problem: Given the locations $\mathbb{X}$, the weights $\mathbf{W}$ and the graph $\mathbf{A}$, recover $ \sigma_n$ exactly.

An estimate $ \hat{\sigma}_n$ of $ \sigma_n$ is said to recover the communities exactly if \[\lim_{n\to \infty}\mathbb{P}\big( \color{#a689e0}{\hat{\sigma}_n} \in\{\pm \color{#a689e0}{\sigma_n} \}\big) = 1 \]

Problem Formulation

$\mathbf{A}$ $\sim GKBM(\lambda,n,d,\kappa_{\text{in}},\kappa_{\text{out}})$

Problem: Given locations $\mathbb{X}$, weights $\mathbf{W}$, and graph $\mathbf{A}$, recover $ \sigma_n$ exactly.

Prior work

Abbe, E., Baccelli, F., and Sankararaman, A., 2021. Community detection on Euclidean random graphs. Information and Inference: A Journal of the IMA, 10(1), 109-160.
Gaudio, J., Niu, X. and Wei, E., 2024. Exact community recovery in the geometric SBM. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2158-2184.
Gaudio, J., Guan, C., Niu, X. and Wei, E., 2024. Exact Label Recovery in Euclidean Random Graphs. arXiv preprint arXiv:2407.11163.

Main results

Torus $ S=\left[\frac{-1}{2},\frac{1}{2}\right]^d$

$\mathbf{A}$ $\sim GKBM(\lambda,n,d,\kappa_{\text{in}},\kappa_{\text{out}})$

Define $s = \max \{\text{supp}(\kappa_{\text{in}}),\text{supp}(\kappa_{\text{out}})\} <\infty, \text{ and }$

\[\mathcal{I}(\kappa_{\text{in}},\kappa_{\text{out}}) := \inf_{w_0}\ \mathbb{E}_W[I(w_0,W)]\]

\[I(w_0,w) = 2 \int_{\mathbb{R}^d} \Big(1-\sqrt{\kappa_{\text{in}}(x) \kappa_{\text{out}}(x)}-\sqrt{\big(1-\kappa_{\text{in}}(x)\big)\big(1-\kappa_{\text{out}}(x)\big)} \Big) dx,\]

with $\kappa_{\text{in}}(x) \equiv \kappa_{\text{in}}(x,w_0,w)$ and $ \kappa_{\text{out}}(x)\equiv \kappa_{\text{out}}(x,w_0,w)$

Converse: If $\lambda s<1$ or $\lambda \mathcal{I}(\kappa_{\text{in}},\kappa_{\text{out}}) < 1$, then exact recovery is not possible using any algorithm.

Achievability: If $\lambda s >1$ and $\lambda \mathcal{I}(\kappa_{\text{in}},\kappa_{\text{out}})> 1$, then there exists a linear time algorithm (in the number of edges) achieving exact recovery.

Impossibility: Idea

If $\lambda s<1$ or $\lambda \mathcal{I}(\kappa_{\text{in}},\kappa_{\text{out}}) < 1$ , then exact recovery is not possible. \[\mathcal{I}(\kappa_{\text{in}},\kappa_{\text{out}}) := \inf_{w_0}\ \mathbb{E}_W[I(w_0,W)],\] \[I(w_0,w) = 2 \int_{\mathbb{R}^d} \Big(1-\sqrt{\kappa_{\text{in}}(x) \kappa_{\text{out}}(x)}-\sqrt{\big(1-\kappa_{\text{in}}(x)\big)\big(1-\kappa_{\text{out}}(x)\big)} \Big) dx,\]

Genie-based estimator: Given the communities of all nodes except the origin, find its community.

Hypothesis testing problem: governed by the Chernoff-Hellinger divergence.

Prob. of error $\ge n^{-\lambda \mathbb{E}_W[I(w_0,W)]}$

Under some assumptions on the weight distribution, this can be strengthened to
Error probability $ \ge n^{-\lambda \mathcal{I}(\kappa_{\text{in}},\kappa_{\text{out}})} $

Total number of errors $\approx \lambda n^{1-\lambda \mathcal{I}(\kappa_{\text{in}},\kappa_{\text{out}})} \to \infty$ when $\lambda \mathcal{I}(\kappa_{\text{in}},\kappa_{\text{out}}) < 1$

Model-1d

Torus $ S=\left[\frac{-1}{2},\frac{1}{2}\right]$

$\mathbf{A}$ $\sim GKBM(\lambda,n,p,q,\varphi)$

Poisson point process $\mathbb{X}$$= \{X_u\}_{u=1}^N$ of intensity $\lambda n$

Two communities: $ \sigma$$ = \big(\sigma(1),\cdots,\sigma(N)\big)$ \[\mathbf{P}(\sigma(u)=\color{#f86d6d}{+1})=\mathbf{P}(\sigma(u)=\color{#ffde20}{-1}) = \frac{1}{2}\]
Geometric kernel: $\varphi: S \times S \to [0,1]$

Connection probabilities:
1. Within community: $p$
2. Between community: $q$

Given locations $\mathbb{X}$ and communities $ \sigma$

\[A_{uv}=1 \text{ w.p. }\begin{cases} p \varphi\Big(\|X_u-X_v\|\Big) & \text{if } \sigma(u)=\sigma(v)\\ q \varphi \Big(\|X_u-X_v\|\Big) & \text{if } \sigma(u) \neq \sigma(v) \end{cases}\]