Consistency between ordering and clustering methods for graphs

Tatsuro Kawamoto Artificial Intelligence Research Center,
National Institute of Advanced Industrial Science and Technology, Tokyo, Japan Masaki Ochi Department of Physics, The University of Tokyo, Chiba, Japan Teruyoshi Kobayashi Department of Economics, Kobe University, Kobe, Japan

August 30, 2022

Abstract

A relational dataset is often analyzed by optimally assigning a label to each element through clustering or ordering. While similar characterizations of a dataset would be achieved by both clustering and ordering methods, the former has been studied much more actively than the latter, particularly for the data represented as graphs. This study fills this gap by investigating methodological relationships between several clustering and ordering methods, focusing on spectral techniques. Furthermore, we evaluate the resulting performance of the clustering and ordering methods. To this end, we propose a measure called the label continuity error, which generically quantifies the degree of consistency between a sequence and partition for a set of elements. Based on synthetic and real-world datasets, we evaluate the extents to which an ordering method identifies a module structure and a clustering method identifies a banded structure.

I Introduction

Identifying macroscopic connection patterns in graphs is a major challenge in network science. A number of algorithms have been proposed to extract different features, such as community structure [1, 2, 3], hierarchical community structure [4, 5], core-periphery structure [6], nested structure [7], and banded structure [8, 9, 10], to name a few.

When a graph consists of subgraphs in each of which vertices are densely connected, the graph structure is referred to as a community structure. A common approach for extracting a community structure is the partitioning of graphs, termed community detection [2, 3] or graph clustering [1]. In this approach, an algorithm assigns a group label to each vertex such that vertices with the same group label are densely connected. Alternatively, we may also identify densely connected vertices through an ordering method that infers the optimal ordering of vertices such that vertices close to each other in the sequence are densely connected. The corresponding optimization problems are collectively termed the minimum linear arrangement [11, 12, 13] or envelope reduction [14, 15], and the inferred structural property is called a banded structure or sequentially local structure [10]. As exemplified in Fig. 1, the densely connected vertices are clearly detected by appropriately visualizing the graph and the adjacency matrix based on an appropriate vertex ordering.

Despite the similarity between these two approaches, the clustering problem has received considerable attention in the literature. Figure 2 shows the number of articles with keywords that represent ordering (pink bars) or clustering (blue bars) problems. Most of the keywords for ordering problems represent more general matrix ordering problems rather than vertex ordering problems for graphs (i.e., adjacency matrices), whereas the keywords for clustering problems mostly capture problems for graphs. Clustering methods have been studied and applied much more actively than ordering methods.

Simple example of a graph (left) and its adjacency matrix (right) for identifying a community structure through the optimal ordering of vertices without specifying the group labels.
The — Figure 1: Simple example of a graph (left) and its adjacency matrix (right) for identifying a community structure through the optimal ordering of vertices without specifying the group labels. The $(i, j)$ element of the adjacency matrix is one (highlighted) when vertices $i$ and $j$ are connected, and zero (not highlighted) otherwise.

Number of articles with a keyword related to the ordering (pink bar) or clustering (blue bar) problems in the title or abstract.
The data was collected from Dimensions — Figure 2: Number of articles with a keyword related to the ordering (pink bar) or clustering (blue bar) problems in the title or abstract. The data was collected from Dimensions [16] on May 30, 2022.

Spectral methods are popular in both ordering and clustering problems; the former and the latter are respectively termed spectral ordering [14, 15, 8] and spectral clustering [17, 18, 19]. In both methods, the leading eigenvector(s) of a Laplacian or its variant is used to identify the optimal ordering or clustering of vertices. Specifically, when a graph is partitioned into two groups based on the sorting of the eigenvector elements [17], the result of spectral clustering is generally consistent with the vertex sequence inferred by spectral ordering.

However, spectral ordering and clustering algorithms are not generally consistent. For instance, when graphs are partitioned into more than two groups, it is common to employ the K-means algorithm [20] on $K (> 2)$ leading eigenvectors to achieve a $K$ -way partitioning [19]. By contrast, to identify the optimal vertex sequence using the spectral ordering method, we always use the eigenvector associated with the second leading eigenvalue. Therefore, it is nontrivial to determine the extent to which the two methods are quantitatively consistent. Even when we partition a graph into two groups, the result of spectral clustering may not be consistent with the vertex sequence obtained by spectral ordering when the K-means algorithm is used to obtain a partition.

We conduct a systematic investigation to evaluate the consistency between the spectral ordering and clustering methods. We first introduce a generic measure, referred to as the label continuity error (LCE), to quantify the difference between a sequence and partition for a set of elements (e.g., vertices of graphs). Intuitively, a sequence and partition are more consistent with each other if, for a given number of groups, the group label flips less often when following the elements in the specified order. We provide a more precise definition in the next section. Although we use this measure throughout the study, it is not the only method of quantifying consistency; we will revisit this point in Sec. V.

There are also several modern spectral clustering algorithms with unexplored ordering counterparts. These include the methods based on the modularity matrix [21, 22], Bethe Hessian [23, 24], and regularized Laplacian [25, 26, 27, 28, 29]. To fill this gap, we show how spectral ordering algorithms can be derived from optimization problems using the matrices on which these modern spectral clustering methods are based. Spectral ordering problems based on these matrices are formulated as variants of the classical spectral ordering problem [14, 15] with different penalty terms and/or constraints.

The remainder of this paper is organized as follows. Section II formally introduces the LCE to quantitatively evaluate the consistency between ordering and clustering methods and examine its properties. Section III formulates spectral ordering methods corresponding to existing spectral clustering methods for graphs. Using the LCE introduced in Sec. II and the methods formulated in Sec. III, we analyze the consistency between spectral ordering and clustering methods using synthetic and real-world networks in Sec. IV. Finally, Sec. V discusses the results of this study.

Ii Label continuity error

Let $G (V, E)$ be a graph, where $V$ is the vertex set and $E$ is the edge set. We assume that every vertex in the graph is distinguishable and let $I = {1, \dots, N}$ be the original sequence of the vertex set $V$ . For vertex $i \in I$ , we denote $π (i) = π_{i} \in {1, \dots, N}$ as the index after permutation (i.e., we use $π$ as both a mapping and a variable) and $\boldmathπ={π(i)|i∈I}$ as the reordered sequence of the vertices. Similarly, we denote $σ (i) = σ_{i} \in {1, \dots, K}$ as the group label of vertex $i$ and $\boldmathσ={σ(i)|i∈I}$ as the partition of the vertex set. We also denote $V_{k} = {i | σ_{i} = k, i \in I}$ and $N_{k} = | V_{k} |$ for group $k$ (we let ${N_{1}, \dots, N_{K}} =: {N_{k}}$ ). Throughout this study, $^\boldmathπ$ and $^\boldmathσ$ represent the inferred sequence and partition using algorithms, respectively.

ii.1 Definition

We introduce a measure to quantify the consistency between a sequence $π$ and partition $σ$ . We define the sequence $π$ as consistent with $σ$ if vertices with the same group label are maximally adjacent to each other in the sequence $π$ . For instance, if the original indices $I$ are consistent with group labels $σ$ ,

N - 1 \sum i = 1 δ (σ (i), σ (i + 1))

(1)

is maximized, where $δ (a, b)$ represents the Kronecker delta; Fig. 3(a) presents an example. To evaluate the consistency between $π$ and $σ$ , we introduce a measure that we refer to as the label continuity, defined by

C(\boldmathπ,\boldmathσ)=1N−1N−1∑i′=1δ(σ(π−1(i′)),σ(π−1(i′+1))),

(2)

where $π^{- 1}$ is the inverse mapping of $π$ and $i^{'}$ is the index label after the permutation; that is, $π^{- 1} (i^{'})$ is the label $i$ in the original indices satisfying $π (i) = i^{'}$ . The number of times that the group labels are flipped when following the vertices in the order of $π$ is expressed as $(N−1)(1−C(\boldmathπ,\boldmathσ))$ , for which the group labels must be flipped at least $K - 1$ times. Considering this feature, we define the label continuity error (LCE) as

Δ(\boldmathπ,\boldmathσ)=1−K−1N−1−C(\boldmathπ,\boldmathσ).

(3)

Hereafter, we abbreviate $C(\boldmathπ,\boldmathσ)$ and $Δ(\boldmathπ,\boldmathσ)$ as $C$ and $Δ$ , respectively, as long as there is no possibility of confusion.

For a given partition $σ$ and different vertex sequences, we can evaluate which vertex sequence is more consistent with $σ$ using the LCE (e.g., Figs. 3(a) and 3(b)). Similarly, for a given sequence $π$ and different partitions, we can also evaluate which partition is more consistent with $π$ , keeping the group sizes ${N_{k}}$ fixed (e.g., Figs. 3(a) and 3(c)).

Figure 3: Examples of the label continuity $C(\boldmathπ,\boldmathσ)$ and the label continuity error $Δ(\boldmathπ,\boldmathσ)$ for different sequences and partitions for the same vertex set. The number on each vertex represents the original index of the vertex.

ii.2 Properties of the LCE

The LCE can take only small values when the number of groups $K$ is very small or large. For example, it is obvious that $Δ$ is zero when $K = 1$ or $K = N$ . In other words, the resolution of the LCE is low in such regions. Moreover, this property would depend on the distribution of the group sizes ${N_{k}}$ . In this section, we quantify these intuitions.

The minimum value of $Δ$ is zero by construction. The maximum value of $Δ$ is obtained when labels are flipped the maximum number of times. The maximization of $Δ$ by optimizing the sequence $π$ , given an arbitrary partition $σ$ with ${N_{k}}$ , is equivalent to maximizing $Δ$ by optimizing partition $σ$ (constrained to ${N_{k}}$ ) for a given sequence $π$ . We denote the maximum by $max Δ$ as

max\boldmathπΔ(\boldmathπ,\boldmathσ)=max\boldmathσ({Nk})Δ(\boldmathπ,\boldmathσ)=maxΔ.

(4)

As derived in Appendix A, we have

max Δ = ⎧ ⎨ ⎩ \begin{matrix} \frac{2 (N - {max}_{k} N_{k})}{N - 1} - \frac{K - 1}{N - 1} & ({max}_{k} N_{k} > ⌈ \frac{N}{2} ⌉) 1 - \frac{K - 1}{N - 1} & (otherwise) \end{matrix},

(5)

where $⌈ \cdot ⌉$ denotes the ceiling function.

We next investigate statistical properties of the LCE. First, we calculate the probability $P (m)$ at which the number of times that two consecutive vertices in a sequence have the same group label is $m$ , where $m=(N−1)C(\boldmathπ,\boldmathσ)$ . When any sequence realizes at random, we have

P(m)=1N!∑\boldmathπ′δ(m,(N−1)C(\boldmathπ′,% \boldmathσ)),

(6)

where $σ$ is an arbitrary partition with group sizes ${N_{k}}$ and the sum is over all possible sequences ( $|{\boldmathπ′}|=N!$ ). Note that Eq. (6) is also a distribution in which each distinguishable partition realizes at random. This equivalence might sound peculiar because there are only $N! / \prod_{k = 1}^{K} N_{k}!$ distinguishable partitions, whereas there are $N!$ possible sequences. However, because every distinguishable partition is overcounted exactly $\prod_{k = 1}^{K} N_{k}!$ times in the summation of Eq. (6), the distribution $P (m)$ is identical for both random sequences and random partitions.

Although Eq. (6) is a straightforward expression, a strict constraint on ${N_{k}}$ makes analytical calculations complicated. Therefore, we instead calculate the distribution of bootstrapped group labels $\boldmathσ∗$ as an approximation. That is, we generate a random group assignment $\boldmathσ∗$ by sampling independently from the empirical distribution $P r o b [k] = N_{k} / N$ ( $k \in {1, \dots, K}$ ); in other words, we randomly resample group labels from $σ$ with replacement. The distribution of group labels $\boldmathσ∗$ is

P(\boldmathσ∗)=N∏i=1Nσ∗(i)N.

(7)

This approximation for random group labels is expected to be accurate if each element in ${N_{k}}$ is sufficiently large.

Using the bootstrapped group labels, the mean value of $C$ is obtained as

E [C]

=∑\boldmathσ∗P(\boldmathσ∗)∑N−1i=1δ(σ∗(i),σ∗(i+1))N−1=K∑k=1(NkN)2.

(8)

Therefore, the mean value of LCE under random partitioning is

¯ ¯¯¯ ¯ Δ ({N_{k}}) := E [Δ] = \frac{N - K}{N - 1} - K \sum k = 1 {(\frac{N_{k}}{N})}^{2} .

(9)

As the LCE does not practically become worse than $¯ ¯¯¯ ¯ Δ ({N_{k}})$ , this mean value is a more meaningful reference value than Eq. (5) as the upper bound.

We can also derive the variance $V a r [Δ]$ (the derivation is shown in Appendix B) as

	$V a r [Δ]$	$= \frac{1}{N - 1} K \sum k = 1 {(\frac{N_{k}}{N})}^{2} + \frac{2 (N - 2)}{(N - 1)^{2}} K \sum k = 1 {(\frac{N_{k}}{N})}^{3}$
		$- \frac{3 N - 5}{(N - 1)^{2}} {(K \sum k = 1 {(\frac{N_{k}}{N})}^{2})}^{2},$		(10)

showing that $Δ$ converges to $E [Δ]$ by the law of large numbers. Furthermore, in Appendix C, we show that the probability distribution is asymptotically normal when the group sizes are equal and $K = O (1)$ , implying that higher-order moments will vanish.

Let us summarize the results we obtained in this section. As the number of groups $K (> 1)$ increases, the upper bound of the LCE ( $max Δ$ in Eq. (5)) decreases monotonically as long as the partitions are not highly skewed, i.e., ${max}_{k} N_{k} < ⌈ N / 2 ⌉$ . However, as illustrated in Fig. 4, the LCE for a random sequence ( $¯ ¯¯¯ ¯ Δ$ in Eq. (9)) is a convex function with respect to $K$ . When $K$ is small, the LCE increases because the chance for label flips increases, while the LCE decreases owing to the increase in the minimum number of label flips. For equipartitioning (Fig. 4(a)), the mean LCE $¯ ¯¯¯ ¯ Δ$ is peaked at an integer of approxmately $K = \sqrt{N - 1}$ . As a partition becomes more skewed (Fig. 4(b)), $max Δ$ and $¯ ¯¯¯ ¯ Δ$ are peaked at larger values of $K$ . Therefore, when evaluating the LCE, we must implement appropriate normalizations.

In this study, we focus on comparing partitions with the same number of groups $K$ . In Appendix D, however, we discuss nested partitions (subpartitions of another partition) as an example in which different partitions have different numbers of groups.

Iii Spectral ordering methods

In this section, we describe variants of spectral ordering methods using different matrices. After reviewing the derivation of the standard methods based on the unnormalized and normalized Laplacians, we show how spectral ordering problems can be formulated with the modularity matrix, regularized Laplacian, and Bethe Hessian.

iii.1 Unnormalized Laplacian

Spectral ordering is derived as a continuous relaxation of the discrete optimization problem called envelope reduction [14]. This problem optimizes the vertex sequence $π$ such that each connected pair of vertices is located close to each other in the sequence. To this end, the following objective function is considered:

H2(\boldmathπ;A)=12∑i,jAij(πi−πj)2,

(11)

which is the sum of squared distances ${(π_{i} - π_{j})}_{i}^{2}$ with respect to the set of connected vertices. The sequence that minimizes this function is the solution to envelope reduction.

As the minimization of Eq. (11) is not computationally feasible, we consider its continuous relaxation. That is, we represent $π$ using a continuous vector $\boldmathx∈RN$ . However, if we simply replace $π$ with $x$ , $\boldmathx=\boldmath0$ would be the trivial minimizer of $H2(\boldmathx;A)$ . Thus, we constrain $x$ such that $\sum_{i = 1}^{N} x_{i}^{2}$ is a positive constant (i.e., the spherical constraint) to reflect the fact that $\sum_{i = 1}^{N} π_{i}^{2}$ is positive regardless of the choice of sequence. Therefore, we consider the minimization of the following function:

\frac{1}{2} \sum i, j A_{i j} (x_{i} - x_{j})^{2} - λ (N \sum i = 1 x_{i}^{2} - 1),

(12)

where $λ$ is the Lagrange multiplier. The extremum condition in Eq. (12) yields the following eigenvalue equation with an eigenvector $ν$ :

L\boldmathν=λ\boldmathν.

(13)

Here, $L \equiv D - A$ is the unnormalized (or combinatorial) Laplacian, where $D = d i a g (d_{1}, \dots, d_{N})$ is the degree matrix ( $d_{i} = \sum_{j = 1}^{N} A_{i j}$ ). Although we would have a vector proportional to $1$ (a vector of ones) as the minimizer of Eq. (12), which is also the eigenvector associated with the smallest eigenvalue of $L$ , we cannot infer the optimal sequence from $1$ because all the elements are identical. Therefore, we exclude vectors proportional to $1$ , which is equivalent to imposing a perpendicular constraint to $1$ in Eq. (12), i.e., $\sum_{i = 1}^{N} x_{i} = 0$ . Then, the minimizer of the objective function is the eigenvector $\boldmathν2$ of $L$ associated with the second-smallest eigenvalue.

The estimate of the optimal sequence $^\boldmathπ$ using the spectral ordering method is

^\boldmathπ={rank(ν2i)|i∈I},

(14)

where $ν_{2 i}$ is the $i$ th element of $\boldmathν2$ , and $r a n k (ν_{2 i})$ is the index of $ν_{2 i}$ in an array in which the vector elements of $\boldmathν2$ are sorted in the ascending or descending order.

iii.2 Normalized Laplacian

A spectral ordering method with the normalized Laplacian was derived in [15]. Note that the objective function (11) does not have a periodic boundary condition. Therefore, while the distance from one vertex at the end of the sequence to another vertex ranges from $1$ to $N - 1$ , the distance from the vertex at the middle of the sequence ranges from $1$ to $⌊ N / 2 ⌋$ , where $⌊ \cdot ⌋$ is the floor function. This implies that when a graph has a vertex with a considerably large degree (i.e., a hub), it is typically more beneficial for the minimization objective to assign such a vertex near the middle of the sequence. To incorporate this feature, we replace the spherical constraint in Eq. (12) with the following ellipsoidal constraint:

N \sum i = 1 d_{i} x_{i}^{2} = c o n s t .,

(15)

which tends to restrict $x_{i}$ with a large $d_{i}$ to be relatively small (recall that a variable with a large coefficient typically has relatively small values on an ellipsoid). Therefore, Eq. (15) constrains $x$ such that $x_{i}$ of a hub vertex $i$ is near the origin, and when $x$ is discretized, the hub vertices are likely to be located near the middle of the sequence. Note also that the mean of ${x_{i}}$ is located at the origin because of the perpendicular constraint $\sum_{i = 1}^{N} x_{i} = 0$ .

Consequently, Eq. (13) is replaced with the following generalized eigenvalue equation with respect to its second-smallest eigenvalue $λ_{2}$ :

L\boldmathν2=λ2D\boldmathν2.

(16)

This is equivalent to

L\boldmathz2=λ2\boldmathz2,

(17)

where $L \equiv D^{- \frac{1}{2}} L D^{- \frac{1}{2}}$ is the normalized Laplacian and $\boldmathz2≡D12\boldmathν2$ . As $\boldmathν2$ is a continuous relaxation of the sequence $π$ , we estimate the optimal sequence $^\boldmathπ$ as

^\boldmathπ={rank(d−1/2iz2i)|i∈I}.

(18)

iii.3 Modularity matrix

The modularity matrix $Q$ appears in the spectral clustering method for modularity maximization in community detection [21]. The matrix element is commonly defined as

Q_{i j} = A_{i j} - \frac{d_{i} d_{j}}{2 M},

(19)

where $M$ is the total number of edges in the graph.

To formulate the spectral ordering problem with the modularity matrix, we again consider the objective function $H2(\boldmathπ;A)$ in the envelope reduction problem and its continuous relaxation with the spherical constraint $\sum_{i = 1}^{N} x_{i}^{2} = 1$ . Herein, we add the following penalty terms to the objective function:

\frac{{(\sum_{i} d_{i} x_{i})}_{i}^{2}}{2 M} - \sum i d_{i} x_{i}^{2} .

(20)

The penalty terms ensure that ${x_{i}}$ are “balanced” around the origin. The first term prohibits ${x_{i}}$ for hub vertices from being located only on the positive or negative side of the real interval $[- 1, 1]$ . Owing to the second term, ${x_{i}}$ associated with hub vertices also tend to be away from the origin. Therefore, the penalty term Eq. (20) decreases when ${x_{i}}$ are more symmetrically distributed around the origin.

Using Lagrange multipliers, the objective function to be minimized is then

	$\frac{1}{2} \sum i, j A_{i j} (x_{i} - x_{j})^{2} + \frac{{(\sum_{i} d_{i} x_{i})}_{i}^{2}}{2 M} - \sum i d_{i} x_{i}^{2} + λ (N \sum i = 1 x_{i}^{2} - 1)$
	$= - \sum i, j x_{i} Q_{i j} x_{j} + λ (N \sum i = 1 x_{i}^{2} - 1) .$		(21)

Here, we do not impose the perpendicular constraint in Eq. (21), because a vector proportional to $1$ is not a trivial minimizer. The extremum conditions in Eq. (21) yield

Q\boldmathν1=λ1\boldmathν1,

(22)

where $λ_{1}$ represents the largest eigenvalue of $Q$ and $\boldmathν1$ is the associated eigenvector. $\boldmathν1$ is the minimizer in Eq. (21) provided that it is not a vector proportional to $1$ . Analogously to Eq. (14), we estimate the optimal sequence $^\boldmathπ$ as

^\boldmathπ={rank(ν1i)|i∈I}.

(23)

The ellipsoidal constraint enforces ${x_{i}}$ for hub vertices to be concentrated around the origin, whereas the penalty terms (20) enforce them to be evenly distributed at both the positive and negative ends of the real line. Therefore, the results of the spectral ordering methods using the normalized Laplacian and modularity matrix are expected to be quite distinct for graphs with heterogeneous degree distributions.

iii.4 Bethe Hessian

Bethe Hessian is also a matrix that is originally formulated to perform spectral clustering [23, 24]. This method is inspired by the statistical inference of the stochastic block model, which will be explained in Sec. IV.1. This section considers a spectral ordering method using the Bethe Hessian.

The derivation of spectral ordering with the Bethe Hessian is analogous to that with the normalized Laplacian. However, instead of imposing an ellipsoidal constraint (15), we introduce $\sum_{i = 1}^{N} d_{i} x_{i}^{2}$ as a penalty term. Thus, we consider the following objective function:

\frac{1}{2} \sum i, j A_{i j} (x_{i} - x_{j})^{2} + τ N \sum i = 1 d_{i} x_{i}^{2},

(24)

where $τ$ is an arbitrary constant (hyperparameter) that can be either positive or negative. To avoid the trivial minimizer $\boldmathx=\boldmath0$ , we impose the spherical constraint $\sum_{i = 1}^{N} x_{i}^{2} = 1$ .

Using Lagrange multipliers, the objective function to be minimized is then

	$\frac{1}{2} \sum i, j A_{i j} (x_{i} - x_{j})^{2} + τ \sum i d_{i} x_{i}^{2} - λ (N \sum i = 1 x_{i}^{2} - 1)$
	$= (1 + τ) \sum i, j x_{i} B_{i j} x_{j} - λ (N \sum i = 1 x_{i}^{2} - 1),$		(25)

where

B_{i j} = D_{i j} - r A_{i j} (r = \frac{1}{1 + τ})

(26)

is a matrix element of Bethe Hessian. The extremum conditions in Eq. (25) yield an eigenvalue equation with respect to $B$ .

We estimate the optimal sequence $^\boldmathπ$ as follows:

^\boldmathπ={rank(ν2i)|i∈I},

(27)

where $\boldmathν2$ is the eigenvector associated with the second-smallest eigenvalue $λ_{2}$ , i.e.,

B\boldmathν2=λ2\boldmathν2.

(28)

Note that there is no guarantee that $\boldmathν2$ always provides the best estimate in terms of $H2(^\boldmathπ;A)$ among all the eigenvectors. In fact, we confirmed that the eigenvector that yields the best estimate in terms of $H2(^\boldmathπ;A)$ (when we employ the rounding rule in Eq. (27)) depends sensitively on the value of $r$ , particularly when $r$ is small (see Appendix S1 for details). However, we employ Eq. (27) because the estimate with $\boldmathν2$ offers the smallest value of $H2(^\boldmathπ;A)$ as long as $r$ is sufficiently large.

Throughout this study, we set $r = \sqrt{\sum_{i} d_{i}^{2} / \sum_{i} d_{i} - 1}$ ( $> 1$ ), because it is a commonly employed value in spectral clustering. The hyperparameter $τ$ is negative when $r > 1$ . Thus, ${x_{i}}$ for hub vertices are aligned near the ends of the real line $[- 1, 1]$ so that a sequence achieves a lower value of Eq. (24) using the penalty term. By contrast, when $r < 1$ ( $τ > 0$ ), ${x_{i}}$ for hub vertices are likely to be located near the origin, implying that the resulting sequence is similar to that obtained by the spectral ordering method based on the normalized Laplacian.

iii.5 Regularized Laplacian

During the past decade, it has been found that the performance of the Laplacian-based spectral clustering can be considerably improved by adding a constant value to every element in the adjacency matrix [26, 28] or the diagonal elements in the degree matrix [25, 27]. Although the two variants of the Laplacian are often termed differently, we collectively refer to them as the regularized Laplacian [28] for simplicity, and we denote the former version of the regularized Laplacian as $L^{(τ)}$ and the latter version as $L$ . The spectral clustering method based on $L$ can also be interpreted as a continuous relaxation of the minimization of the core cut function [29]. This section considers the spectral ordering method using a regularized Laplacian.

Similar to the formulation of the spectral ordering method with the modularity matrix, we consider the continuous relaxation of $H2(\boldmathπ;A)$ with a penalty term. We consider the following objective function:

12∑i,jAij(xi−xj)2+τNVar[\boldmathx],

(29)

which is minimized with respect to the continuous vector $x$ . $τ$ is an arbitrary positive constant (hyperparameter) and

Var[\boldmathx]=⎛⎝1NN∑i=1x2i−(1NN∑i=1xi)2⎞⎠

(30)

is the variance with respect to the elements in $x$ . To ensure that $x$ is not a vector of zeros, we impose the following ellipsoidal constraint:

N \sum i = 1 (d_{i} + τ) x_{i}^{2} = 1.

(31)

By incorporating this constraint, the objective function to be minimized is given by

	$\frac{1}{2} \sum i, j A_{i j} (x_{i} - x_{j})^{2} + τ N \sum i = 1 x_{i}^{2} - \frac{τ}{N} {(N \sum i = 1 x_{i})}_{i}^{2}$
	$- λ (N \sum i = 1 (d_{i} + τ) x_{i}^{2} - 1) .$		(32)

Because a vector proportional to $1$ is a trivial minimizer of Eq. (32), we also impose the constraint that $x$ is perpendicular to $1$ , i.e., $\sum_{i = 1}^{N} x_{i} = 0$ . Then, the extremum conditions in Eq. (32) yield

A(τ)\boldmathν2=(1−λ2)(D+τI)\boldmathν2,(A(τ)ij=Aij+τN),

(33)

where $I$ is the identity matrix. $λ_{2}$ is the second-smallest eigenvalue of the generalized eigenvalue equation and $\boldmathν2$ is the associated generalized eigenvector. Equation (33) is equivalent to

L(τ)\boldmathz2=λ2% \boldmathz2,

(34)

where

(35)

is the regularized Laplacian and $\boldmathz2=(D+τI)1/2\boldmathν2$ . Similar to the spectral ordering method with the normalized Laplacian, we estimate the optimal sequence $^\boldmathπ$ as

^\boldmathπ={rank((di+τ)−1/2z2i)|i∈I}.

(36)

Figure 5: Ellipse equation $x_{1}^{2} / a^{2} + x_{2}^{2} / b^{2} = 1$ ( $a = 3$ and $b = 1$ ), where $x_{2}$ corresponds to the variable for a hub vertex. The color depth represents the variance $Var[\boldmathx]$ for $\boldmathx=(x1,x2)$ . Although most of the coordinates on the ellipse have $x_{1} > x_{2}$ , the variance is smaller when $x_{1}$ and $x_{2}$ are closer.

The contribution of hub vertices is more complicated than that of other methods. As shown in Fig. 5, whereas ${x_{i}}$ for the hub vertices tend to be relatively small because of the ellipsoidal constraint, the variance $Var[\boldmathx]$ is minimized when all ${x_{i}}$ have the same value. Therefore, when the hyperparameter $τ$ is small, the result is similar to that obtained by using the spectral ordering method based on the normalized Laplacian. As $τ$ increases, the hub vertices are less likely to be located in the middle of the sequence because of the penalty term.

As mentioned above, we also consider

L = {(D + τ I)}^{- 1 / 2} A {(D + τ I)}^{- 1 / 2}

(37)

as the definition of a regularized Laplacian. Unlike $L^{(τ)}$ , only the degree matrix is perturbed by a constant value in $L$ . If we consider

\frac{1}{2} \sum i, j A_{i j} (x_{i} - x_{j})^{2} + τ N \sum i = 1 x_{i}^{2} .

(38)

as the objective function to be minimized, and impose the ellipsoidal constraint (31), we obtain the eigenvalue equation with respect to $L$ as a result of the extremum conditions.

If we also impose the constraint $\sum_{i = 1}^{N} x_{i} = 0$ in Eq. (38), this objective function becomes equivalent to Eq. (29). We do not have such a constraint because $L$ does not have $1$ as a trivial eigenvector unlike $L^{(τ)}$ . The eigenvectors of $L^{(τ)}$ and $L$ are therefore distinct. As a spectral ordering method with the regularized Laplacian $L$ , we replace $\boldmathz2$ in Eq. (36) with the eigenvector associated with the second-smallest eigenvalue of $L$ . Here, we use the second-smallest value because $L$ approaches $L$ as $τ \to 0$ , and $L$ has $\boldmathz1∝\boldmath1$ . Hereafter, when we refer to the spectral ordering method with the regularized Laplacian, we employ $L$ because it is more computationally efficient. Throughout this study, we set $τ$ as the average degree of the graph, as it is a commonly employed value [27].

Matrix

Constraints

Penalty terms

Effects on hub locations

Unnormalized Laplacian

L = D - A

\sum_{i = 1}^{N} x_{i}^{2} = 1

\sum_{i = 1}^{N} x_{i} = 0

Normalized Laplacian

L = D^{- \frac{1}{2}} L D^{- \frac{1}{2}}

\sum_{i = 1}^{N} d_{i} x_{i}^{2} = 1

\sum_{i = 1}^{N} x_{i} = 0

Concentrate around the middle

Modularity matrix

Q=A+\boldmathd⊤\boldmathd2M

(\boldmathd=(d1,…,dN))

\sum_{i = 1}^{N} x_{i}^{2} = 1

\frac{{(\sum_{i} d_{i} x_{i})}_{i}^{2}}{2 M} - \sum_{i} d_{i} x_{i}^{2}

Distribute at both ends

Bethe Hessian

B = D - r A

\sum_{i = 1}^{N} x_{i}^{2} = 1

perpendicular to

\boldmathν1

τ \sum_{i = 1}^{N} d_{i} x_{i}^{2}

τ > 0

: Concentrate around the middle

τ < 0

: Distribute at either/both ends

Regularized Laplacian

L^{(τ)} = I - D_{(τ)}^{- 1 / 2} A^{(τ)} D_{(τ)}^{- 1 / 2}

(D_{(τ)} = D + τ I)

\sum_{i = 1}^{N} (d_{i} + τ) x_{i}^{2} = 1

\sum_{i = 1}^{N} x_{i} = 0

τNVar[\boldmathx]

small

τ

: Concentrate around the middle

large

τ

: Avoid concentration around the middle

Regularized Laplacian

L = D_{(τ)}^{- 1 / 2} A D_{(τ)}^{- 1 / 2}

\sum_{i = 1}^{N} (d_{i} + τ) x_{i}^{2} = 1

perpendicular to

\boldmathν1

τ \sum_{i = 1}^{N} x_{i}^{2}

small

τ

: Concentrate around the middle

large

τ

: Avoid concentration around the middle

Table 1: Summary of constraints and penalty terms in spectral methods, and their effect on hub location in vertex sequence.

The constraints and penalty terms for each method are summarized in Table 1. Compared to the classical method based on the normalized Laplacian, where hub vertices are concentrated around the middle of the sequence (“hub-centered”), the spectral ordering methods obtained with the modularity matrix, Bethe Hessian, and the regularized Laplacian may assign hub vertices at both ends of the sequence (“hub-at-the-corner”). Particularly, for the Bethe Hessian and the regularized Laplacian, we can choose the “hub-centered” or “hub-at-the-corner” alignment by tuning the hyperparameter. The next section investigates the practical performance of these spectral ordering and clustering methods using synthetic and real-world datasets. Although one might expect all the methods to work similarly when the graph is close to regular, it is not trivial to determine whether this always holds; this is investigated using synthetic datasets in Secs. IV.1 and IV.2. The effect of heterogeneous degree distribution in each method is examined using real-world datasets in Sec. IV.3.

Iv Performance analysis

We conduct a numerical performance analysis of the spectral ordering and clustering methods using synthetic graphs and real-world networks. For experiments on synthetic graphs, we consider a random graph model with a prespecified module structure, which is referred to as the stochastic block model (SBM) [30, 31, 32], and a random graph model with a prespecified sequentially local structure, which is referred to as the ordered random graph model (ORGM) [10].

iv.1 Stochastic block model

The SBM is often used as a generative model for the inference of module structures in graphs [33, 34] and in several theoretical studies in the community detection literature [35]. In the SBM, each vertex has a “planted” (or preassigned) group assignment; we denote the corresponding partition as $\boldmathσB$ . Each vertex pair is connected by an edge, independently and randomly, based on the planted group assignments. The probabilities for the upper-right elements of the adjacency matrix are given as follows:

P r o b [{A_{i j}}_{i < j}] = \prod i < j p_{σ_{i}^{B} σ_{j}^{B}}^{A_{i j}} {(1 - p_{σ_{i}^{B} σ_{j}^{B}})}_{σ_{i}^{B} σ_{j}^{B}}^{1 - A_{i j}},

(39)

where $p_{k ℓ}$ is the probability that a vertex in group $k$ and vertex in group $ℓ$ are connected (in Eq. (39), $k = σ_{i}^{B}$ and $ℓ = σ_{j}^{B}$ ). We have $A_{i j} = A_{j i}$ for any pair of elements because we consider undirected graphs. In general, the SBM can generate graphs with complex module structures. Herein, however, we focus on the SBM with a community structure that is characterized by the following group-wise connection probability:

p_{k ℓ} = {\begin{matrix} p_{i n} & (k = ℓ) p_{o u t} & (k \neq ℓ) \end{matrix},

(40)

where $0 < p_{o u t} \leq p_{i n} \leq 1$ , that is, vertices are more densely connected within the same planted group than between different groups. In particular, when the group sizes are equal, it is common to parametrize the model using the average degree $c$ and the fuzziness parameter $ϵ$ , which are related to $p_{i n}$ and $p_{o u t}$ as

c = \frac{N}{K} (p_{i n} + (K - 1) p_{o u t}), ϵ = \frac{p_{o u t}}{p_{i n}} .

(41)

As $ϵ$ approaches unity, the planted community structure becomes less clear. This particular case of the SBM is known as the planted partition model [36]. For a given average degree $c$ , the critical value of $ϵ$ above which an algorithm cannot detect the planted block structure better than chance is called the (algorithmic) detectability limit [37, 38, 39, 40, 41, 42].

Using the SBM and spectral ordering methods, we investigate the following questions:

How would the reordered adjacency matrix look like? Can we visually identify the community structure through the matrix?
When and how would the spectral ordering methods lose their correlations with the planted partition in the SBM? Are the spectral ordering methods superior or inferior to their clustering counterparts in detecting the planted partition? Does the choice of matrix matter?

To answer these questions, we apply both spectral ordering and clustering methods to graphs generated by the SBM.

Figure 6: Spectral ordering with the regularized Laplacian for graphs generated by the SBM ( $N = 60$ and $c = 8$ ). The top panels show the adjacency matrices of a graph with a strong community structure where vertices are aligned by (a) the sequence based on the planted partition $\boldmathσB$ and (b) an inferred sequence $^\boldmathπ$ . The bottom panels show the adjacency matrices of a graph with a weak community structure where vertices are aligned based on (c) $\boldmathσB$ and (d) $^\boldmathπ$ . The matrix elements indicating the connections within the same planted group are represented in the same color; otherwise, the elements are colored in gray.

We first investigate the former question. Figure 6 shows the results of spectral ordering applied to instances of SBM. Vertices in the same planted group are indeed located closely in the inferred sequence when the community structure is strong. Even when the community structure is weak, the planted group labels and the inferred sequence are correlated. In both examples, the boundaries of the groups are ambiguous. Therefore, if we do not know the planted group labels (and without the coloring of the adjacency matrix elements), it is not clear whether the identified structure is a community structure or a banded structure from the reordered adjacency matrix. Note that, as discussed in [10], even when we generate a graph from a uniformly random graph model, one could identify a weak banded structure owing to the ordering of vertices.

Next, we address the latter question. The consistency between the sequence inferred by a spectral ordering method $^\boldmathπ$ and planted partition $\boldmathσB$ is measured with the normalized LCE $Δ(^\boldmathπ,\boldmathσB)/¯¯¯¯¯Δ({NBk})$ . Here, ${N_{k}^{B}}$ is the set of group sizes in $\boldmathσB$ and $¯ ¯¯¯ ¯ Δ ({N_{k}^{B}})$ is the LCE under a random sequence defined in Eq. (9). When $Δ(^\boldmathπ,\boldmathσB)$ saturates (i.e., the normalized LCE equals unity) as $ϵ$ increases, the spectral ordering method does not infer $\boldmathσB$ better than random; it is deemed that the algorithm has reached the detectability limit.

The consistency between the inferred partition $^\boldmathσ$ by a spectral clustering method and planted partition $\boldmathσB$ is measured using the normalized mutual information (NMI) [43], which is defined as

NMI(^\boldmathσ,\boldmathσB)=2I(^\boldmathσ;% \boldmathσB)H(^\boldmathσ)+H(% \boldmathσB),

(42)

where

H(\boldmathσ)=−∑k∈{1,…,K}q(k)logq(k)(q(k)=NkN)

(43)

is the entropy with respect to the frequency of group labels, and

I(\boldmathσ1;\boldmathσ2)=∑k∈{1,…,K}∑k′∈{1,…,K}q(k,k′)logq(k,k′)q(k)q(k′)

(44)

is the mutual information. Here, $q (k, k^{'})$ is the fraction of cooccurrences that a vertex belonging to group $k$ in partition $\boldmathσ1$ belongs to group $k^{'}$ in partition $\boldmathσ2$ . The NMI is unity when a pair of partitions coincides perfectly. When $NMI(^\boldmathσ,\boldmathσB)$ reaches (nearly) zero as $ϵ$ increases, the spectral clustering method does not infer $\boldmathσB$ better than random, which again represents the detectability limit. The detectability analysis of spectral clustering methods is not new and has been analyzed in several theoretical and benchmark studies [40, 41, 42, 44, 45]. We evaluate $Δ(^\boldmathπ,\boldmathσB)$ and $NMI(^\boldmathσ,\boldmathσB)$ to compare the performances of the spectral ordering and clustering methods for each of the matrices considered in the previous section.

Detectability of the SBM for the spectral ordering and spectral clustering methods.
The top panels show the result for small graphs with (a) — Figure 7: Detectability of the SBM for the spectral ordering and spectral clustering methods. The top panels show the result for small graphs with (a) $K = 2$ , (b) $K = 3$ , and (c) $K = 4$ . The bottom panels show the result for large graphs with (d) $K = 2$ , (e) $K = 3$ , and (f) $K = 4$ . In each panel, the values of the NMI obtained by the spectral clustering methods (top) and the values of the LCE obtained by the spectral ordering methods (bottom) are shown. The horizontal axis represents the fuzziness of community structure $ϵ$ . Each symbol and error bar represents the mean and the standard deviation of $30$ samples that are obtained with the same SBM parameters.

Figure 7 shows the performances of the ordering and clustering methods based on the SBM for different graph sizes $N$ , numbers of blocks $K$ , and fuzziness parameter $ϵ$ . When graphs are small (and thus relatively dense), there is no clear saturation in the curves of the LCE and the NMI, and it is difficult to evaluate whether the ordering methods or clustering methods exhibit superior performance in terms of the detectability limit. Moreover, the differences in performances are not noticeable among the different matrices, except for the unnormalized Laplacian. When graphs are large, we can clearly identify the saturation. For the unnormalized and normalized Laplacians, the values of LCE gradually decrease, even when the values of NMI saturate, indicating that the spectral ordering methods are superior to their clustering counterparts. By contrast, the detectability limits of the modularity matrix, regularized Laplacian, and Bethe Hessian are not very different between the ordering and clustering methods. In addition, the methods with the regularized Laplacian and Bethe Hessian perform similarly and are superior to the other matrices, whereas the methods with the unnormalized Laplacian are clearly inferior.

iv.2 Ordered random graph model

Figure 8: Results of the spectral clustering methods using the normalized Laplacian with different numbers of groups $K$ , applied to a graph generated by the ORGM. The parameters of the ORGM are $N = 50$ , $c = 10$ , $ϵ = 0.2$ and $r / N = 0.16$ . (a) The vertices of the adjacency matrix are ordered based on the original ordering in the ORGM. For panels (b)–(d), the vertices are ordered such that the vertices in the same inferred group are close to each other: (b) $K = 2$ , (c) $K = 3$ , and (d) $K = 4$ . The nonzero matrix elements are represented by the same color when they are edges connecting the vertices within an inferred group; otherwise, the nonzero matrix elements are colored in gray.

Performance of the spectral clustering method for the graphs generated by the ORGM.
The graphs are generated by the ORGM with — Figure 9: Performance of the spectral clustering method for the graphs generated by the ORGM. The graphs are generated by the ORGM with $N = 1, 000$ and $c = 6$ . Each panel shows the normalized LCE $Δ(I,^\boldmathσ)/¯¯¯¯¯Δ({^Nk})$ for various parameter sets of the ORGM for a matrix used in the spectral clustering method. Each point represents the $10$ -sample average of the normalized LCE under the same parameter set.

We have observed how and to what extent the community structure can be inferred using spectral ordering methods. This section discusses the opposite scenario. That is, we analyze whether the spectral clustering methods can infer banded structures. To this end, we conduct a performance analysis using the ORGM.

The vertex set in the ORGM has a planted sequence, as the vertex set in the SBM has a planted partition. We let the planted sequence coincide with the original sequence $I$ . The edges in the ORGM are generated independently and randomly by referring to the planted sequence. We divide the space of the adjacency matrix elements into two regions, $Ω_{i n}$ and $Ω_{o u t}$ . $Ω_{i n}$ (resp. $Ω_{o u t}$ ) is the set of elements in which an edge connects two vertices that are deemed to be “close” (resp. “not close”) to each other. An edge is generated between a vertex pair with probability $p_{i n}$ if they are “close” and with probability $p_{o u t}$ otherwise. Therefore, the probabilities of the upper-right elements of the adjacency matrix are given as follows:

	$P ({A_{i j}}_{i < j} \| {p_{i j}}) = \prod i < j p_{i j}^{A_{i j}} {(1 - p_{i j})}_{i j}^{1 - A_{i j}},$		(45)
	$p_{i j} = {\begin{matrix} p_{i n} & (i, j) \in Ω_{i n} p_{o u t} & (i, j) \in Ω_{o u t} \end{matrix} .$		(46)

We set the boundary of $Ω_{i n}$ and $Ω_{o u t}$ as

{\begin{matrix} Ω_{i n} & = {(i, j) | | i - j | \leq r} Ω_{o u t} & = {(i, j) | | i - j | > r} \end{matrix},

(47)

where $r$ is the bandwidth that specifies the boundary of the regions. Although Eq. (47) is a simple one, we note that the boundary in the ORGM can be more complex in general. In the following, instead of $p_{i n}$ and $p_{o u t}$ , we specify the edge density by the average degree $c$ and the strength of the banded structure $ϵ = p_{o u t} / p_{i n}$ ; when $ϵ = 0$ , nonzero elements in the adjacency matrix are completely confined within $Ω_{i n}$ , whereas the model is uniformly random when $ϵ = 1$ . (See Fig. 8(a) for an example of the resulting adjacency matrix of the ORGM). In summary, except for the number of vertices $N$ , which is a nuisance parameter, the parameters in the ORGM are the average degree $c$ , strength of the banded structure $ϵ$ , and bandwidth ratio $r / N$ .

Using the ORGM and the spectral clustering methods, we investigate the following questions:

How would the reordered adjacency matrix look like? Can we visually identify the banded structure through the matrix?
How and when would the spectral clustering algorithms lose their correlations with the planted ordering in the ORGM?

We first investigate the former question. Figure 8 shows the results of a spectral clustering method with different values of $K$ applied to a graph generated by the ORGM. A graph tends to be partitioned into equally-sized groups (see also Fig. S2 in the Supplementary Material for a quantitative evidence). Recall that we observe a banded structure through a spectral ordering method even when the graph is generated from the SBM (Fig. 6). Analogously, we can identify block-diagonal structures in Fig. 8 although the graph is generated from the ORGM. This is an interesting observation because it implies that some of the community structures identified in the literature may be better described by banded structures.

Figure 9 shows the normalized LCE $Δ(I,^\boldmathσ)/¯¯¯¯¯Δ({^Nk})$ between the planted sequence $I$ and inferred partition $^\boldmathσ$ , where ${^Nk}$ is the set of group sizes in $^\boldmathσ$ . The normalized LCE is generally low when $r / N$ is not too small or large and $ϵ$ is small.

The existence of detectability limits is implied from Figure 9. In the limit of $N \to \infty$ , there exists a critical value of $ϵ$ above which the normalized LCE is unity for any value of $r / N$ . Moreover, for a given $ϵ$ , there also exists an upper limit (and possibly a lower limit) of the bandwidth ratio $r / N$ beyond which a spectral clustering method is not correlated with the planted sequence better than a random guess. These critical values depend on the average degree (see Fig. S3 in the Supplementary Material for the numerical phase diagrams).

Analogous to the analysis for the SBM, the performance of the unnormalized Laplacian is notably inferior in terms of the normalized LCE; in most of the parameter sets, it does not perform better than a random guess. The behaviors of the modularity matrix, Bethe Hessian, and regularized Laplacian are similar. Moreover, the results for the latter two matrices are apparently identical. In contrast to the analysis of graphs generated by the SBM, the performance of the normalized Laplacian is as good as or even better than that of the Bethe Hessian and regularized Laplacian.

The inferior performance of the spectral clustering with the unnormalized Laplacian can also be characterized by the distribution of the group sizes ${N_{k}}$ . As depicted in Fig. S2 in the Supplementary Material, the fraction of the largest group ${max}_{k} N_{k} / N$ is nearly unity, i.e., most of the vertices belong to the same group. In such a case, the result of clustering contains very little information about the inherent ordering in the graph; as shown in Fig. 4(b), the upper bound $max Δ$ and the mean value under the random sequence $¯ ¯¯¯ ¯ Δ$ are small when a partition is highly skewed, reflecting the fact that the group labels tend to be aligned consecutively for any sequence. A possible mechanism for such skewed distributions of group sizes is the emergence of localized eigenvectors [41, 46], which deteriorates the performance of spectral clustering. However, we do not pursue the detailed mechanisms that could have caused the outcome obtained in this study.

In summary, we have confirmed that some spectral clustering methods detect community structures that are correlated to the inherent sequential structure of the ORGM, and that there are nontrivial limits of detectability.

iv.3 Real-world networks

Figure 10: Adjacency matrix aligned with spectral ordering based on a matrix annotated at the top and its corresponding LCE, $Δ / ¯ Δ$ . Colors denote vertex groups inferred by the K-means method. (a) Zachary’s karate club network [47] and (b) a network of political books [48].

We now apply the spectral ordering and clustering methods to five empirical adjacency matrices. Descriptions of the empirical datasets examined are provided in Table S1. Note that many empirical datasets exhibit a high degree heterogeneity, whereas the synthetic graphs in Secs. IV.1 and IV.2 do not. As discussed in Sec. III, spectral orderings with different matrices are characterized as the minimization problem of $H_{2}$ with different constraints and penalty terms (Table 1), and these differences become prominent when vertex degrees are heterogeneous.

In Fig. 10, we see a banded structure for the vertex orderings based on the normalized and unnormalized Laplacians, $L$ and $L$ , for the karate club (Fig. 10(a)) and political books datasets (Fig. 10(b)), where the hub vertices tend to be located around the middle of the optimized sequence. In contrast, the ordering method with the modularity matrix $Q$ locates vertices with large degrees at both ends of the sequence, as expected from the penalty term in the objective function (20). A similar observation applies to the methods using the Bethe Hessian $B$ and regularized Laplacian $L$ (Figs. 10, S4 and S7). Importantly, however, vertex orderings based on these matrices is critically influenced by the regularization parameter $τ$ .

For many empirical graphs, the hyperparameter $r = \sqrt{\sum_{i} d_{i}^{2} / \sum_{i} d_{i} - 1}$ in the Bethe Hessian takes a large positive value, i.e., $τ < 0$ (Eq. (24)). Thus, the penalty term $τ \sum d_{i} x_{i}^{2}$ contributes to reducing the objective function. Hence, hub vertices tend to be aligned at the ends of the vertex sequence. However, the spectral ordering with the regularized Laplacian has an exogenous regularization parameter $τ$ in its constraint and penalty terms (see Eqs. (32) and (38)), where we set $τ$ as the average degree. As discussed in Sec. III.5, a larger value of $τ$ tends to avoid locating hub vertices around the middle of the sequence. Although the validation analysis based on synthetic graphs suggested that the sequences inferred based on these matrices are fairly similar (Sec. IV), they do not necessarily coincide in general. Note also that, as $τ \to 0$ , the Bethe Hessian $B$ approaches the unnormalized Laplacian $L$ , and the regularized Laplacian $L$ approaches the normalized Laplacian $L$ (Table 1). Therefore, when $τ$ is small in absolute value, the optimal vertex sequences based on the Bethe Hessian and regularized Laplacian are close to those obtained from the unnormalized and normalized Laplacians, respectively (Figs. S8 and S9). Indeed, the location of vertices with large degrees in optimal vertex sequence can be tuned by varying $τ$ , from a “hub-centered” alignment to a “hub-at-the-corner” alignment.

Figure 10 also shows the normalized LCE representing the consistency between the inferred sequence $^\boldmathπ$ and group labels $^\boldmathσ$ for each matrix used in the spectral ordering and clustering methods. When we set $K = 2$ in the clustering method, as shown in Figs. 10(a) and 10(b), $^\boldmathπ$ and $^\boldmathσ$ are perfectly consistent in terms of the LCE. For $K \geq 3$ , the LCEs are mostly lower than $0.8$ and are typically approximately $0.5$ (Figs. 10 and S14–S14), suggesting that the optimized vertex sequences using spectral ordering convey some information about a non-random structure. We also find that some methods yield similar LCEs for all datasets, whereas the LCEs obtained with the (un)normalized Laplacian exhibit different behaviors (Figs. S14–S14). This is consistent with the previous numerical observation that the spectral ordering based on the (un)normalized Laplacian is quite distinct from those obtained from the modularity matrix, Bethe Hessian, and regularized Laplacian (Fig. 10).

Interestingly, despite having distinct optimized sequences using different objective functions, the value of the normalized LCE can be very close to each other. Therefore, adjacency matrices may exhibit the same or similar structures from the perspective of community structure, and they are differentiated only by detailed orderings within each group.

V Summary and discussion

This study analyzed the relationship between the ordering and clustering methods for graphs by quantifying the extent to which vertices close to each other in the optimized sequence have the same group label through the LCE. To obtain analytical insight into spectral ordering, we first showed that the spectral ordering problem is formulated as a minimization of the squared sequential distance $H_{2}$ subject to a particular penalty function and constraints, depending on the matrix representation of a graph (i.e., normalized Laplacian, modularity matrix, etc). The numerical results suggested that the spectral ordering methods, except that based on unnormalized Laplacian, often yield optimized sequences such that vertices in the same group are close to each other; that is, the normalized LCEs are considerably below $1$ as long as strong community structures exist.

Several issues remain to be addressed in future studies. First, we defined LCE to quantify the continuity of group labels for a given vertex sequence. The consistency between ordering and clustering can also be measured in other ways; for example, one can quantify the continuity of indices in a vertex sequence for given group labels on the vertices, whereas the LCE quantifies the continuity of group labels for a given vertex sequence. Second, we focused on unipartite graphs for which the connectivities are represented by square matrices (i.e., adjacency matrices). In principle, the proposed method can also be applied to study non-square matrices, such as bipartite graphs. Third, we implemented ordering and clustering methods independently and examined their consistency. Given that we found some consistency between the two, it would be possible to develop a clustering method that incorporates information about the inherent vertex sequence. Analogously, the spectral ordering method can be adjusted in such a way that the obtained vertex sequence reflects group labels. We expect our work will stimulate further research in these directions.

Appendix A Upper bound of the label continuity error

Figure 11: Vertex sequence yielding the maximum LCE for a given group sizes ${N_{k}}$ . Panels (a) and (b) are cases where $N_{1} > ⌈ N / 2 ⌉$ and $N_{1} = N / 2$ , respectively, and panels (c) and (d) are cases where $N_{1} < N / 2$ . The sequence with the maximum LCE can be constructed by aligning the vertices with different labels alternately in the procedure shown in each step. Vertices in the box indicate that they are to be aligned in the following steps. The vertex indices are omitted because they are not essential for the construction of a sequence.

We derive the upper bound of the LCE by explicitly constructing a worst-case sequence. We assume that a partition $σ$ is given (i.e., the number of groups $K$ and group sizes ${N_{k}}$ are given), and the first group is the largest group (i.e., ${max}_{k} N_{k} = N_{1} = | V_{1} |$ ). When $N_{1}$ satisfies $N_{1} > ⌈ N / 2 ⌉$ , some vertices in $V_{1}$ must be aligned consecutively. As exemplified in Fig. 11(a), the LCE is maximized when the vertices in $V_{1}$ and those in $\cup_{k > 1} V_{k}$ are aligned alternately as possible. In this case, there are $2 (N - N_{1})$ vertices that are aligned alternately with different group labels, and the label continuity is $C = (2 N_{1} - N - 1) / (N - 1)$ . Therefore, the maximum LCE leads to

	$Δ$	$= 1 - \frac{K - 1}{N - 1} - \frac{2 N_{1} - N - 1}{N - 1}$
		$= \frac{2 (N - N_{1})}{N - 1} - \frac{K - 1}{N - 1},$		(48)

which corresponds to the upper case of Eq. (5).

When $N_{1}$ is less than or equal to the sum of the vertices in all other groups (Figs. 11(b), 11(c), and 11(d)), vertices can be aligned such that no group labels are consecutive. Such a sequence is constructed as follows. We first align the vertices in $V_{1}$ and the vertices in $\cup_{k > 1} V_{k}$ as alternately as possible. In this step, all the vertices in $V_{1}$ are aligned, and there are $\sum_{k > 1} N_{k} - N_{1}$ vertices that are not yet aligned; here, in $\cup_{k > 1} V_{k}$ , we preferentially consume the labels with larger $N_{k}$ (Fig. 11(b) and Step 1 in Figs. 11(c) and 11(d)). When there are remaining vertices, we regard a set of alternately-aligned vertices as a fundamental unit and treat all such sets as “super vertices” with the same labels. We then align the super vertices and the remaining vertices in the same manner as in the previous step. We repeat this procedure until all vertices are aligned. We can always align vertices and super vertices alternately because the number of remaining vertices with the same label never exceeds the number of already aligned vertices or super vertices. Therefore, we can establish a sequence for which the label continuity $C$ is zero, and the upper bound of the LCE leads to

Δ

= 1 - \frac{K - 1}{N - 1} .

(49)

Appendix B Variance of the label continuity error in random partitions

The second moment of $Δ$ is

	$E [Δ^{2}]$
	$=∑\boldmathσ∗P(\boldmathσ∗)(N−KN−1−∑N−1i=1δ(σ∗(i),σ∗(i+1))N−1)2$

	$+ \frac{1}{(N - 1)^{2}} N - 1 \sum i = 1 \sum σ_{i}^{}, σ_{i + 1}^{} P (σ_{i}^{}) P (σ_{i + 1}^{}) δ (σ^{} (i), σ^{} (i + 1))$
	$+ \frac{2}{(N - 1)^{2}} N - 2 \sum i = 1 \sum σ_{i}^{}, σ_{i + 1}^{}, σ_{i + 2}^{} P (σ_{i}^{}) P (σ_{i + 1}^{}) P (σ_{i + 2}^{})$
	$\times δ (σ^{} (i), σ^{} (i + 1)) δ (σ^{} (i + 1), σ^{} (i + 2))$
	$+ \frac{1}{(N - 1)^{2}} \sum \begin{matrix} i, j (\| i - j \| > 2) \end{matrix} \sum \begin{matrix} σ_{i}^{}, σ_{i + 1}^{}, σ_{j}^{}, σ_{j + 1}^{} \end{matrix} P (σ_{i}^{}) P (σ_{i + 1}^{}) P (σ_{j}^{}) P (σ_{j + 1}^{})$
	$\times δ (σ^{} (i), σ^{} (i + 1)) δ (σ^{} (j), σ^{} (j + 1))$		(50)
	$= {(\frac{N - K}{N - 1})}^{2} - 2 \frac{N - K}{N - 1} K \sum k = 1 {(\frac{N_{k}}{N})}^{2} + \frac{1}{N - 1} K \sum k = 1 {(\frac{N_{k}}{} N)}^{2}$
	$+ \frac{2 (N - 2)}{(N - 1)^{2}} K \sum k = 1 {(\frac{N_{k}}{N})}^{3} + \frac{(N - 2) (N - 3)}{(N - 1)^{2}} {(K \sum k = 1 {(\frac{N_{k}}{} N)}^{2})}^{2} .$

Thus, the variance $V a r [Δ]$ is

$V a r [Δ]$	$= E [Δ^{2}] - E {[Δ]}^{2}$
	$= \frac{1}{N - 1} K \sum k = 1 {(\frac{N_{k}}{N})}^{2} + \frac{2 (N - 2)}{(N - 1)^{2}} K \sum k = 1 {(\frac{N_{k}}{N})}^{3}$
	$- \frac{3 N - 5}{(N - 1)^{2}} {(K \sum k = 1 {(\frac{N_{k}}{N})}^{2})}^{2} .$	(52)

Appendix C Probability distribution of the label continuity error in random partitions

This section derives the probability distribution of the label continuity error $Δ(\boldmathπ,\boldmathσ)$ when group labels are assigned randomly based on bootstrapped group labels $\boldmathσ∗$ .

To derive the probability distribution of $Δ(\boldmathπ,\boldmathσ)$ , it is sufficient to calculate that of label continuity $C(\boldmathπ,\boldmathσ)$ . The probability of $(N - 1) C = m$ is

	$P [(N - 1) C = m]$
	$=∑\boldmathσ∗P(\boldmathσ∗)δ(m,N−1∑i=1δ(σ∗(i),σ∗(i+1)))$
	$=∑\boldmathσ∗N∏i=1Nσ∗(i)N∮dz2πiz∑N−1i=1δ(σ∗(i),σ∗(i+1))−m−1$
	$=∮dz2πiz−(1+m)∑\boldmathσ∗Nσ∗(N)NN−1∏i=1(Nσ∗(i)Nzδ(σ∗(i),σ∗(i+1)))$
			(53)

where

D \equiv d i a g (\frac{N_{1}}{N}, \dots, \frac{N_{K}}{N}),

F≡\boldmath1\boldmath1⊤+(z−1)I.

(54)

Here, $I$ is the identity matrix. In Eq. (53), we used the identity

δ (x, y) = \oint \frac{d z}{2 π i} \frac{1}{z^{x - y + 1}},

(55)

which is an integral around the origin of the complex plane.

Using the eigenvalue decomposition, $F$ can be expressed as

z−1+KK\boldmath1\boldmath1⊤+(z−1)K∑k=2\boldmathvk\boldmathvk⊤,

(56)

where $v_{k}$ ( $2 \leq k \leq K$ ) is an eigenvector of $F$ that is perpendicular to $1$ , and we have

FD\boldmath1=z−1+KK\boldmath1+(z−1)K∑k=2\boldmathvk\boldmathvk⊤D% \boldmath1.

(57)

Because the second term in Eq. (57) vanishes when the group sizes are equal, the exact probability distribution can be derived as follows:

	$P [(N - 1) C = m]$
	$= \frac{1}{K^{N - 1}} \oint \frac{d z}{2 π i} z^{- (1 + m)} (z - 1 + K)^{N - 1}$
	$= \frac{1}{K^{N - 1}} \oint \frac{d z}{2 π i} z^{- (1 + m)} N - 1 \sum k = 0 (\frac{N - 1}{k}) z^{k} (K - 1)^{N - 1 - k}$
	$= (\frac{N - 1}{m}) {(\frac{1}{K})}^{m} {(1 - \frac{1}{K})}^{N - 1 - m} .$		(58)

Equivalently,

P (C) = (\frac{N - 1}{(N - 1) C}) {(\frac{1}{K})}^{(N - 1) C} {(1 - \frac{1}{K})}^{(N - 1) (1 - C)} .

(59)

Therefore, $(N - 1) C$ follows a binomial distribution. This result can be interpreted as follows. We suppose that there are $N$ elements that are linearly aligned, and we assign group labels from one end. As we focus only on the consecutive property of the group labels, the label of the first element can be arbitrary. For the next $N - 1$ elements, the probability that the label is consecutive to the previous one is $1 / K$ , whereas the complement probability is $1 - 1 / K$ because the group label can be arbitrary as long as it is not identical to the previous one. We sum over all possible patterns that have consecutive labels $m$ times to obtain $P [C = \frac{m}{N - 1}]$ .

Even when the group sizes are not equal, Eq. (58) is close to the actual distribution as long as the second term in Eq. (57) is negligible. When $N ≫ 1$ and the size of each group is of constant order, i.e., $N / K = O (1)$ , Eq. (58) is well approximated as a Poisson distribution. Furthermore, when $N / K ≫ 1$ , the distribution is nearly normal. The distribution of $Δ(\boldmathπ,\boldmathσ)$ is obtained by shifting the distribution (59) by a constant factor.

Appendix D Label continuity errors for nested partitions

As an example of partitions with different numbers of groups, here we investigate the difference in the LCEs between a partition $σ$ with $K$ groups and its nested partition $\boldmathσ′$ . The partition $\boldmathσ′$ is obtained by subpartitioning the vertices $V_{K}$ having $K$ th group label in $σ$ into $V_{K, 1}$ and $V_{K, 2}$ ( $V_{K, 1} \cup V_{K, 2} = V_{K}$ ); we denote the sizes of these two groups as $N_{K, 1}^{'}$ and $N_{K, 2}^{'}$ ( $N_{K, 1}^{'} + N_{K, 2}^{'} = N_{K}$ ), and also denote ${N_{k}^{'}} = {N_{1}, \dots, N_{K - 1}, N_{K, 1}^{'}, N_{K, 2}^{'}}$ . The partitions $σ$ and $\boldmathσ′$ are only locally different. The difference in the LCEs for $σ$ and $\boldmathσ′$ with the same sequence $π$ is bounded as

	$−1N−1≤Δ(\boldmathπ,% \boldmathσ′)−Δ(\boldmathπ,% \boldmathσ)$
	$\leq \frac{2 min {N_{K, 1}^{'}, N_{K, 2}^{'}} - δ (N_{K, 1}^{'}, N_{K, 2}^{'}) - 1}{N - 1} .$		(60)

The lower bound is trivial because the label continuity $C$ is a nonnegative quantity and $C$ cannot be smaller when $C = 0$ before subpartition. The upper bound of Eq. (60) can be derived as follows. The difference in the LCE is maximized when the difference in $C$ is maximized. Note that the number of flips of the labels can be maximized when the labels before the subpartition are aligned completely consecutively, e.g., the case in Fig. 12(a). In this case, we can maximize the difference in $C$ by aligning the vertices in $V_{K, 1}$ and $V_{K, 2}$ as alternately as possible. The achieved difference is

	$C(\boldmathπ,\boldmathσ′)−C(\boldmathπ,\boldmathσ)$	$= ⎧ ⎪ ⎨ ⎪ ⎩ \begin{matrix} - \frac{N_{K} - 1}{N - 1} & (N_{K, 1}^{'} = N_{K, 2}^{'}) - \frac{2 min {N_{K, 1}^{'}, N_{K, 2}^{'}}}{N - 1} & (N_{K, 1}^{'} \neq N_{K, 2}^{'}) \end{matrix}$
		$= \frac{δ (N_{K, 1}^{'}, N_{K, 2}^{'}) - 2 min {N_{K, 1}^{'}, N_{K, 2}^{'}}}{N - 1} .$		(61)

Equation (60) indicates that the LCE is a local quantity, that is, the bound of variation in the LCE is characterized by $N_{K, 1}^{'}$ and $N_{K, 2}^{'}$ ; the variation tends to be small when $min {N_{K, 1}^{'}, N_{K, 2}^{'}}$ is small. However, when it comes to the specific difference, not bounds, it depends not only on the subsequence within $V_{K}$ , but also on the position $V_{K}$ in the entire sequence $π$ (see Fig. 12 for specific examples). The present result implies that comparison of the LCEs is generally complicated when partitions have different numbers of groups.

Figure 12: Effect of subpartitioning on the label continuity $C$ and label continuity error $Δ$ . The partition $\boldmathσ′$ is a nested partition of $σ$ ; the yellow label in $σ$ is subpartitioned into the yellow and green labels in $\boldmathσ′$ . (a) When the group labels are maximally consecutive (sequence $π$ ), $C$ becomes smaller by subpartitioning. (b) When the group labels are not consecutive at all (sequence $~\boldmathπ$ ), $C$ does not change regardless of the choice of the nested partition. Although $σ$ and $\boldmathσ′$ are the same between (a) and (b), the value of $C$ is affected by the locations of the blue labels.

Acknowledgements.

This work was supported by JST ACT-X Grant No. JPMJAX21A8 (Kawamoto), JSPS KAKENHI 19H01506, 22H00827 (Kawamoto and Kobayashi), 20H05633 (Kobayashi), and Quantum Science and Technology Fellowship Program (Q-STEP) (Ochi).

References

Schaeffer [2007] S. E. Schaeffer, Computer Science Review 1, 27 (2007).
Fortunato [2010] S. Fortunato, Phys. Rep. 486, 75 (2010).
Fortunato and Hric [2016] S. Fortunato and D. Hric, Physics Reports 659, 1 (2016), community detection in networks: A user guide.
Clauset et al. [2008] A. Clauset, C. Moore, and M. E. Newman, Nature 453, 98 (2008).
Peixoto [2014] T. P. Peixoto, Phys. Rev. X 4, 011047 (2014).
Rombach et al. [2014] M. P. Rombach, M. A. Porter, J. H. Fowler, and P. J. Mucha, SIAM Journal on Applied Mathematics 74, 167 (2014).
Mariani et al. [2019] M. S. Mariani, Z.-M. Ren, J. Bascompte, and C. J. Tessone, Physics Reports 813, 1 (2019), nestedness in complex networks: Observation, emergence, and implications.
Liiv [2010] I. Liiv, Statistical Analysis and Data Mining: The ASA Data Science Journal 3, 70 (2010).
Behrisch et al. [2016] M. Behrisch, B. Bach, N. Henry Riche, T. Schreck, and J.-D. Fekete, Comput. Graph Forum 35, 693 (2016).
Kawamoto and Kobayashi [2021] T. Kawamoto and T. Kobayashi, Sequential locality of graphs and its hypothesis testing (2021).
Harper [1964] L. H. Harper, Journal of the Society for Industrial and Applied Mathematics 12, 131 (1964).
Chung [1984] F. Chung, Comput. Math. with Appl. 10, 43 (1984).
Seitz [2010] H. Seitz, Contributions to the minimum linear arrangement problem, Ph.D. thesis (2010).
Barnard et al. [1995] S. T. Barnard, A. Pothen, and H. Simon, Numerical Linear Algebra with Applications 2, 317 (1995).
Ding and He [2004] C. Ding and X. He, in Proceedings of the Twenty-First International Conference on Machine Learning, ICML ’04 (Association for Computing Machinery, New York, NY, USA, 2004) p. 30.
[16] https://app.dimensions.ai/discover/publication?search_mode=content.
Hagen and Kahng [1992] L. Hagen and A. Kahng, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 11, 1074 (1992).
Shi and Malik [2000] J. Shi and J. Malik, IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 888 (2000).
Luxburg [2007] U. Luxburg, Stat. Comput. 17, 395â416 (2007).
Arthur and Vassilvitskii [2007] D. Arthur and S. Vassilvitskii, in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07 (Society for Industrial and Applied Mathematics, USA, 2007) p. 1027â1035.
Newman [2006a] M. E. J. Newman, Phys. Rev. E 74, 036104 (2006a).
Newman [2006b] M. E. J. Newman, Proc. Natl. Acad. Sci. U.S.A. 103, 8577 (2006b).
Saade et al. [2014] A. Saade, F. Krzakala, and L. Zdeborová, in Advances in Neural Information Processing Systems, Vol. 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger (Curran Associates, Inc., 2014).
Dall’Amico et al. [2019] L. Dall’Amico, R. Couillet, and N. Tremblay, in Advances in Neural Information Processing Systems, Vol. 32, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett (Curran Associates, Inc., 2019).
Chaudhuri et al. [2012] K. Chaudhuri, F. Chung, and A. Tsiatas, in Proceedings of the 25th Annual Conference on Learning Theory, Proceedings of Machine Learning Research, Vol. 23, edited by S. Mannor, N. Srebro, and R. C. Williamson (PMLR, Edinburgh, Scotland, 2012) pp. 35.1–35.23.
Amini et al. [2013] A. A. Amini, A. Chen, P. J. Bickel, and E. Levina, The Annals of Statistics 41, 2097 (2013).
Qin and Rohe [2013] T. Qin and K. Rohe, in Advances in Neural Information Processing Systems, Vol. 26, edited by C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger (Curran Associates, Inc., 2013).
Joseph and Yu [2016] A. Joseph and B. Yu, The Annals of Statistics 44, 1765 (2016).
Zhang and Rohe [2018] Y. Zhang and K. Rohe, in Advances in Neural Information Processing Systems, Vol. 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Curran Associates, Inc., 2018).
Holland et al. [1983] P. W. Holland, K. B. Laskey, and S. Leinhardt, Soc. Networks 5, 109 (1983).
Wang and Wong [1987] Y. J. Wang and G. Y. Wong, J. Am. Stat. Assoc 82, 8 (1987).
Peixoto [2012] T. P. Peixoto, Phys. Rev. E 85, 056122 (2012).
Goldenberg et al. [2010] A. Goldenberg, A. X. Zheng, S. E. Fienberg, and E. M. Airoldi, Foundations and Trends in Machine Learning 2, 129 (2010).
Peixoto [2019] T. P. Peixoto, Bayesian stochastic blockmodeling, in Advances in Network Clustering and Blockmodeling (John Wiley & Sons, Ltd, 2019) Chap. 11, pp. 289–332.
Abbe [2018] E. Abbe, J. Mach. Learn. Res. 18, 1 (2018).
Condon and Karp [2001] A. Condon and R. M. Karp, Random Struct. Algorithms 18, 116 (2001).
Decelle et al. [2011] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová, Phys. Rev. Lett. 107, 065701 (2011).
Mossel et al. [2015] E. Mossel, J. Neeman, and A. Sly, Probab. Theory Relat. Fields 162, 431 (2015).
Massoulié [2014] L. Massoulié, in Proceedings of the 46th Annual ACM Symposium on Theory of Computing, STOC ’14 (ACM, New York, 2014) pp. 694–703.
Nadakuditi and Newman [2012] R. R. Nadakuditi and M. E. J. Newman, Phys. Rev. Lett. 108, 188701 (2012).
Kawamoto and Kabashima [2015a] T. Kawamoto and Y. Kabashima, Phys. Rev. E 91, 062803 (2015a).
Kawamoto and Kabashima [2015b] T. Kawamoto and Y. Kabashima, Eur. Phys. Lett. 112, 40007 (2015b).
Danon et al. [2005] L. Danon, A. Díaz-Guilera, J. Duch, and A. Arenas, J. Stat. Mech. 2005, P09008 (2005).
Darst et al. [2014] R. K. Darst, Z. Nussinov, and S. Fortunato, Phys. Rev. E 89, 032809 (2014).
Yang et al. [2016] Z. Yang, R. Algesheimer, and C. J. Tessone, Scientific reports 6, 1 (2016).
Von Luxburg et al. [2008] U. Von Luxburg, M. Belkin, and O. Bousquet, Ann. Statist. , 555 (2008).
Zachary [1977] W. W. Zachary, Journal of anthropological research 33, 452 (1977).
[48] V. Krebs, The political books network.
[49] https://graph-tool.skewed.de.
White et al. [1986] J. G. White, E. Southgate, J. N. Thomson, S. Brenner, et al., Philos Trans R Soc Lond B Biol Sci 314, 1 (1986).
Lusseau et al. [2003] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, Behavioral Ecology and Sociobiology 54, 396 (2003).
Girvan and Newman [2002] M. Girvan and M. E. Newman, Proc. Natl. Acad. Sci. U.S.A. 99, 7821 (2002).
Evans [2010] T. S. Evans, J. Stat. Mech.: Theory Exp. 2010 (12), P12037.
Knuth [1993] D. E. Knuth, The Stanford GraphBase: A Platform for Combinatorial Computing, Vol. 1 (ACM Press: New York, 1993).
Adamic and Glance [2005] L. A. Adamic and N. Glance, in Proceedings of the 3rd international workshop on Link discovery (2005) pp. 36–43.
Baskerville et al. [2011] E. B. Baskerville, A. P. Dobson, T. Bedford, S. Allesina, T. M. Anderson, and M. Pascual, PLoS computational biology 7, e1002321 (2011).

Supplementary Materials

“Consistency between ordering and clustering methods for graphs”

Tatsuro Kawamoto, Masaki Ochi and Teruyoshi Kobayashi

S1 Hyperparameter dependency in the Bethe Hessian

This section investigates the dependency of the hyperparameter $r$ and the choice of eigenvector in the Bethe Hessian $B$ on spectral ordering. Figure S1 shows, for each $^\boldmathπ$ estimated by $k$ th ( $k = 1, 2, 3, 4$ ) eigenvector, the achieved value of $H2(^\boldmathπ;A)$ as we sweep the hyperparameter $r$ . In this experiment, we used an instance of the ORGM with $N = 100$ , $c = 6$ , $ϵ = 0.1$ , and $r / N = 0.1$ . The dashed line in the figure represents the default value of $r$ .

This result indicates that the eigenvector with which $H2(^\boldmathπ;A)$ is minimized varies as $r$ increases, particularly when $r$ is relatively small. However, when $r$ is sufficiently large, the estimate $^\boldmathπ$ using the eigenvector $\boldmathν2$ associated with the second-smallest eigenvalue yields the lowest value of $H2(^\boldmathπ;A)$ .

Based on this observation, we employ $\boldmathν2$ for ordering with the Bethe Hessian. It should also be noted that, as shown in Fig. S1, the global minimum of $H2(^\boldmathπ;A)$ is typically achieved when $r$ is lower than the default value we employed. Therefore, although it is not within the scope of this study, a better performance would be obtained if we optimize with respect to $r$ .

Figure S1: Dependency of the hyperparameter $r$ in the Bethe Hessian on the achieved value of $H2(^\boldmathπ;A)$ . Each line represents the result based on the sequence $^\boldmathπ$ estimated through the $k$ th ( $k = 1, 2, 3, 4$ ) eigenvector $\boldmathνk$ .

Dataset	$N$	$M$	Data description	Reference
adjnoun	112	425	Word adjacencies of common adjectives and nouns in the novel David Copperfield.	[21]
celegans	297	2,359	Neural connections of the C. elegans nematode.	[50]
dolphins	62	159	Frequent associations among dolphins in a community.	[51]
football	115	613	Network of American football games between Division IA colleges.	[52], [53]
karate	34	77	Network of friendships among members of a university karate club.	[47]
lesmis	77	254	Character co-appearance network of Les Misérables.	[54]
netscience	1,589	2,742	Collaboration network among scientists working on network science.	[21]
polblogs	1,490	19,090	Network of hyperlinks among U.S. political blogs.	[55]
polbooks	105	441	Co-purchase network of books on US politics.	[48]
foodweb	161	592	Plant and mammal food web in the Serengeti savanna ecosystem.	[56]

Table S1: Description of empirical datasets. All the data can be loaded from graph-tool [49].

Performance of the spectral clustering method on graphs generated by the ORGM.
The graphs are generated by the ORGM with — Figure S2: Performance of the spectral clustering method on graphs generated by the ORGM. The graphs are generated by the ORGM with $N = 1, 000$ and $c = 6$ . Each panel shows the fraction of the largest group ${max}_{k} {N_{k}} / N$ for various parameter sets of the ORGM in each matrix used in the spectral clustering. Each point represents the $10$ -sample average of ${max}_{k} {N_{k}} / N$ under the same parameter set.

Figure S3: Phase diagrams of the normalized LCEs in the $(r / N, ϵ)$ -space. For graphs generated by the ORGM with $N = 1, 000$ , we conducted the spectral clustering methods with $K = 2$ and measured the normalized LCE. Each point represents the $30$ -sample average of the normalized LCE under the same parameter set. We set the average degrees to (a) $c = 6$ and (b) $c = 12$ .

Figure S4: Adjacency matrix with vertices being aligned with the spectral ordering methods: $K = 2$ . Colors denote vertex groups inferred by the K-means method. Edges between vertices in different groups are colored in gray.

Figure S8: Effect of regularization parameter $τ$ on the spectral ordering based on Bethe Hessian and regularized Laplacian: the Karate club data ( $K = 2$ ). $τ = - 0.8, - 0.3, 0, 1$ , and $9$ respectively corresponds to $r = 5, 1.5, 1, 0.5$ , and $0.1$ . Note that for $τ = 0$ and $r = 1$ , the optimized sequences for Bethe Hessian and regularized Laplacian are almost identical to that of normalized Laplacian. Colors denote vertex groups inferred by the K-means method. Edges between vertices in different groups are colored in gray.

Figure S10: Normalized LCE for $K = 2$ .