The graph bottleneck identity
A matrix $S=(s_{ij})\in{\mathbb R}^{n\times n}$ is said to determine a \emph{transitional measure} for a digraph $G$ on $n$ vertices if for all $i,j,k\in\{1,\...,n\},$ the \emph{transition inequality} $s_{ij} s_{jk}\le s_{ik} s_{jj}$ holds and reduce…
Authors: Pavel Chebotarev
Two interesting properties of several well-known proximity/similarity measures s(i, j) = s ij for digraph vertices are that s ij s jk ≤ s ik s jj and that s ij s jk = s ik s jj if and only if every path from i to k contains j. We call these the transition inequality and the graph bottleneck identity, respectively. For the path accessibility with a sufficiently small parameter and also for the connection reliability, the route accessibility, and two versions of the directed forest accessibility, the foregoing properties are proved in Sections 5 and 6 below. In Sections 3 and 4, we show that every positive-valued function with the above properties (we call such functions transitional measures) gives rise to a graph-geodetic (i.e., such that d(i, j) + d(j, k) = d(i, k) if and only if every path connecting i and k contains j) logarithmic metric. As a synonym of metric, we use the term distance, i.e., a distance is assumed to satisfy the triangle inequality. Graph-geodetic distances, in particular, are useful because they enable one to instantly check whether there are paths connecting i and k and not passing through j for any vertices i, j, and k. Moreover, they have interesting mathematical properties. In the rest of this section, we introduce some graph-theoretic notation and basic results mainly used in Sections 5 and 6.
Let Γ be a weighted directed multigraph (in what follows, for brevity, a "digraph Γ") with vertex set V = V (Γ) = {1, . . . , n}, n > 1. Assume that Γ has no loops. For i, j ∈ V , let n ij ∈ {0, 1, . . .} be the number of arcs emanating from i to j in Γ; for every p ∈ {1, . . . , n ij }, let w p ij > 0 be the weight of the pth arc directed from i to j in Γ; let w ij = n ij p=1 w p ij (if n ij = 0, we set w ij = 0) and W = (w ij ) n×n . W is the matrix of total arc weights. The outdegree and indegree of vertex i are od(i) = n j=1 n ij and id(i) = n j=1 n ji , respectively. By the weight of a digraph H, w(H), we mean the product of the weights of all its arcs. If H has no arcs, then w(H) = 1. The weight of a finite or denumerable set S, w(S), is the sum of the weights of the elements in S; the weight of the empty set is zero. If S is finite and contains digraphs whose arc weights are unity (i.e., the digraphs in S are actually unweighted), then w(S) is equal to the cardinality of S.
For v 0 , v k ∈ V (Γ), a v 0 → v k path in Γ is an alternating sequence of vertices and arcs v 0 , a 1 , v 1 , . . . , a k , v k where all vertices are distinct and each a i is a v i-1 → v i arc. The unique v 0 → v 0 path is the "sequence" v 0 having no arcs. The length of a path is the number k of its arcs. The weight of a path is the product of the weights of its arcs. The weight of a v 0 → v 0 path is 1. A digraph is strong (or strongly connected ) if for every vertices v and v ′ , it has a v → v ′ path. A digraph is weakly connected if the corresponding undirected graph is connected.
A converging tree is a weakly connected weighted digraph in which one vertex, called the root, has outdegree zero and the remaining vertices have outdegree one. A converging forest is a weighted digraph all of whose weakly connected components are converging trees. The roots of these trees are referred to as the roots of the converging forest. A spanning converging forest of Γ is called an in-forest of Γ.
For a fixed digraph Γ, by F →• and F i→•j we denote the set of all in-forests of Γ and the set of all in-forests of Γ that have vertex i belonging to a tree rooted at j, respectively. Let
Let L = (ℓ ij ) be the Laplacian matrix of Γ, i.e., for i, j = 1, . . . , n,
Consider the matrix
where I is the identity matrix. By the matrix forest theorem [6,4] ("undirected" versions of this theorem can be found in [5,14]), for any digraph Γ, Q does exist and
Therefore, F = f Q = f•(I +L) -1 . The matrix Q can be considered as a proximity (similarity) matrix of Γ [6,3]; it has a random walk interpretation [3,Section 4]; in the case of undirected graphs, it is also called the regularized Laplacian kernel (cf. [18]).
In Sections 5 and 6, we show that the values f ij and several other proximity indices satisfy the transition inequality and the graph bottleneck identity. Some general implications of these properties (mainly relating to the construction of graph distances) are studied in Sections 2, 3, and 4. The results obtained have undirected counterparts; one of them is presented in Section 7. In [2], the approach of this paper is used to fill the gap between the shortest path distance and the resistance distance for undirected graphs.
We say that a matrix S = (s ij ) ∈ R n×n satisfies the transition inequality if for all 1 ≤ i, j, k ≤ n,
Lemma 1. If S = (s ij ) ∈ R n×n satisfies the transition inequality, then for all 1 ≤ i, j ≤ n,
Proof. This is immediate by setting k = i in (5).
Remark 1. Inequality (6) bears a close analogy to the Cauchy-Bunyakovsky-Schwarz inequality. Therefore, if S is symmetric, has positive diagonal, and satisfies (5), then it can be treated as a matrix of variances and covariances or a Gram matrix. As a result, say, arccos
√ s ii s jj can be considered as the angle between the objects represented by i and j, which is suitable for scaling purposes; see also [1,Section 7.9]. At last the transition inequality is a multiplicative analogue of the triangle inequality for proximities [6,7] also called the "unrooted correlation triangle inequality" [9]. Furthermore, we say that a matrix S = (s ij ) ∈ R n×n satisfies the graph bottleneck identity
holds if and only if all directed paths in Γ (all paths in G) from i to k contain j.
Eq. ( 7) is referred to as the graph bottleneck identity because it pertains to the case where j is a kind of a bottleneck (or a cut point) for the i → k paths: the removal of j disconnects k from i.
To shorten the terminology, we give the following definition.
Definition 1. Given a digraph Γ with vertex set V = {1, . . . , n}, suppose that a matrix S = (s ij ) n×n satisfies the transition inequality ( 5) and the graph bottleneck identity (7) w.r.t. Γ. Then we say that S determines the transitional measure s(i, j) = s ij , i, j ∈ V, for Γ.
For undirected graphs, the notion of transitional measure is defined similarly. It will be shown in Sections 5 and 6 that several popular graph proximity measures are transitional.
Lemma 2. If S = (s ij ) ∈ R n×n determines a transitional measure for some digraph Γ, then1 for all 1 ≤ i, j ≤ n such that j = i,
Proof. Setting k = i in ( 5) and taking into account that there is a path of length 0 from i to k = i that does not contain j = i we conclude that the transition inequality and the graph bottleneck identity yield (8).
The main object of our interest in this paper is the distances constructed on the basis of transitional measures.
If a matrix S satisfies the transition inequality ( 5) and its off-diagonal entries are positive, then all the entries of S are positive. In this case, define the matrix
where --→ ϕ(S) stands for elementwise operations, i.e., operations applied to each entry of S separately. Consider the matrix
where h is the column vector containing the diagonal entries of H, 1 is the column of n ones, and H T , h T , and 1 T are the transposes of H, h, and 1. An alternative form of ( 10) is D = (U +U T )/2, where U = h1 T -H, and the elementwise form is
. This is a standard transformation used to obtain a distance from a proximity measure (cf. the inverse covariance mapping in [9] and [1, Section 12.1]).
Theorem 1. If S = (s ij ) n×n determines a transitional measure for some digraph Γ and has positive off-diagonal entries, then D = (d ij ) n×n defined by (9) and (10) is a matrix of distances on {1, . . . , n}.
Before proving Theorem 1 we give an expression for the entries of D. Eqs. ( 9) and (10) for every i, j = 1, . . . , n imply
Proof of Theorem 1. The proof amounts to showing that for all i, j, k ∈ {1, . . . , n}:
Indeed, the symmetry and non-negativity of D, which are sometimes considered as part of the definition of distance, follow from (i) and (ii). Since S has positive off-diagonal entries, the transition inequality implies the positivity of S.
To prove (i), note that if i = j, then by (10), d ij = 0. Conversely, if d ij = 0, then by ( 11), s ii s jj = s ij s ji holds, which, by Lemma 2, implies that i = j.
To prove (ii), observe that by ( 9), (10), and the transition inequality (5),
holds. This completes the proof.
Based on Theorem 1, we give the following definition.
Definition 2. Suppose that S = (s ij ) n×n has positive off-diagonal entries and determines a transitional measure for some digraph Γ. The logarithmic distance corresponding to S is the function d : {1, . . . , n} 2 → R such that d(i, j) = d ij , i, j = 1, . . . , n, where D = (d ij ) is defined by ( 9) and (10).
In Section 4, it is shown that every distance of this kind is graph-geodetic. 4 The graph bottleneck identity implies the geodetic property of the logarithmic distance Definition 3. For a multidigraph Γ (a multigraph G) with vertex set V, a function d :
) holds if and only if every directed path in Γ connecting i and k in either direction (every path in G connecting i and k) contains j.
•) is a distance on digraph vertices, then the property of being graph-geodetic (this term is taken from [13]) is a natural condition of strengthening the triangle inequality to equality. Knowing a graph-geodetic distance enables one to instantly check whether j "separates" i and k or not for any i, j, k ∈ V (Γ). The classical shortest path distance clearly possesses the "if" (but not the "only if") part of the graph-geodetic property; the "if" part of this property for the resistance distance was proved in [12]. The ordinary distance in a Euclidean space satisfies a similar condition resulting from substituting "line segment" for "path in G." Theorem 2. Suppose that S = (s ij ) n×n has positive off-diagonal entries and determines a transitional measure for some digraph Γ. Then the logarithmic distance corresponding to S is graph-geodetic for Γ.
Proof. Using (12) and the transition inequality we conclude that d ij +d jk = d ki is true if and only if
In turn, by the graph bottleneck identity, this holds if and only if every path in Γ connecting i and k in either direction contains j. Thus, by Definition 3, the logarithmic distance d(i, j) = d ij (i, j = 1, . . . , n) corresponding to S is graph-geodetic for Γ.
Graph-geodetic functions have many interesting properties. One of them, as mentioned in [12], is a simple connection (such as that obtained in [10]) between the cofactors and the determinant of Γ's distance matrix and those of the maximal blocks of Γ that have no cut points. Another property is the recursive Theorem 8 in [13]. The graph-geodetic distances are not Euclidean; however, by Blumenthal's "Square-Root" theorem, the corresponding "square-rooted" distances satisfy the 3-Euclidean condition (see, e.g., [13]).
Obviously, it is (9) that guarantees the graph-geodetic property of the matrix D obtained by means of (10) from a transitional measure. If H = S, then this property is not secured and a sufficient condition of D's being a distance matrix is provided by the following proposition. Proposition 1. Suppose that S = (s ij ) n×n satisfies the transition inequality (5) and s jj > min(s ij , s ji ), s jj ≥ max(s ij , s ji ), and s jj > 0 for all i, j = 1, . . . , n, j = i. (13) Then D defined by (10) with H = S is a matrix of distances.
Proof. Assuming that (5) and ( 13) are satisfied we prove that (i) d ij = 0 if and only if i = j and (ii) d ij + d jk -d ki ≥ 0 for all i, j, k = 1, . . . , n. Since by (10),
hold, (j = i) ⇒ (d ij = 0) is immediate and (j = i) ⇒ (d ij = 0) follows from (13). Furthermore, since by (13), s jj > 0, (5) implies that s ik ≥ s ij s jk s -1 jj and s ki ≥ s kj s ji s -1 jj , therefore, by ( 14) and ( 13),
In Sections 5 and 6, we show that several well-known graph proximity measures are transitional.
In this section, we consider two instances of transitional measures. With relation to the graph bottleneck identity, they represent a very special case in which for every i ∈ V, s ii = 1.
The path τ -accessibility of j from i in Γ is the total τ -weight of all paths from i to j:
where P ij is the set of all i → j paths in Γ,
l(P ij ) and w(P ij ) are the length and the weight of P ij , and τ > 0. By definition, for every i ∈ V, the unique "path from i to i" is the path of length 0 whose weight is unity, whence s ii = 1, i = 1, . . . , n.
Theorem 3. For any digraph Γ, there exists τ 0 > 0 such that for every τ ∈ (0, τ 0 ), S = (s ij ) defined by (15) determines a transitional measure for Γ.
Proof. For arbitrary i, j, k ∈ V, P ij ∈ P ij , and P jk ∈ P jk , let v be the first (along P ij ) vertex of P ij that belongs to P jk . Then combining the i → v subpath of P ij with the v → k subpath of P jk we obtain a well-defined path P ik ∈ P ik whose τ -weight is no less than w τ (P ij )•w τ (P jk ) for each sufficiently small τ > 0. If this P ik contains j (i.e., v = j), then w τ (P ik ) = w τ (P ij ) w τ (P jk ) (16) for every τ > 0. Otherwise, if a fixed P ik does not contain j, then a τ 0 (P ik , j) > 0 can be chosen in such a way that w τ (P ik ) >
for all 0 < τ < τ 0 (P ik , j), where the sum is taken over all P ij ∈ P ij and P jk ∈ P jk such that combining the i → v subpath of P ij with the v → k subpath of P jk produces the fixed P ik (which is denoted by (P ij , P jk ) → P ik ). Let τ 0 = min i, j, k∈V, P ik ∈P ik {τ 0 (P ik , j)}, where P ik is the set of all i → k paths in Γ that do not contain j. Thus, if 0 < τ < τ 0 , then ( 17) holds for all P ik ∈ P ik and ( 16) holds for all P ik ∈ P ik P ik . Consequently, for any τ ∈ (0, τ 0 ) and any i, j, k ∈ V,
with the equality if and only if every i → k path contains j. The transition inequality and the graph bottleneck identity follow.
Consider a digraph Γ with arc weights w p ij ∈ (0, 1] interpreted as the intactness probabilities of the arcs. Define p ij to be the i → j connection reliability, i.e., the probability that at least one path from i to j remains intact, provided that the arc failures are independent. Let P = (p ij ) be the matrix of connection reliabilities for all pairs of vertices. For every j ∈ V, p jj = 1, because the j → j path of length 0 is always intact.
The connection reliabilities can be represented as follows (see, e.g., [17, p. 10]):
where P 1 , P 2 , . . . , P m are all i → j paths in Γ, Pr(P k ) = w(P k ), Pr(P k P t ) = w(P k ∪ P t ), P k ∪ P t is the subdigraph of Γ containing those arcs that belong to P k or P t , and so forth. By virtue of ( 18), connection reliability is a modification of path accessibility that takes into account the degree of overlap for various paths between vertices.
Theorem 4. For any digraph Γ with arc weights w p ij ∈ (0, 1], the matrix P = (p ij ) of connection reliabilities determines a transitional measure for Γ.
Proof. Let E ij be the event that at least one path connecting i to j remains intact. Then, since E ij ∧ E jk ⇒ E ik , by the independence assumption we have
with the equality if and only if every path from i to k contains j.
Corollary 1 (of Theorems 2, 3, and 4). For any strong digraph Γ, the logarithmic distances corresponding to the matrix S = (s ij ) defined by (15) with a sufficiently small τ and to the matrix P = (p ij ) of connection reliabilities (whenever w p ij ∈ (0, 1]) are graph-geodetic for Γ. Proof. Since for a strong digraph Γ, the matrices S and P have positive off-diagonal entries, the desired statements follow from Theorems 3, 4, and 2.
The next section is devoted to the transitional measures in which the diagonal elements s(i, i) measure the (relative) strength of connections of every vertex to itself.
The following theorem is the main technical result of this paper.
Theorem 5. For any digraph Γ, the matrix of in-forests F = (f ij ) defined by (1) determines a transitional measure for Γ.
There seems to be no easy way to construct a direct bijective proof of Theorem 5 (such as the proofs of Theorems 3 and 6). So we present an indirect proof relying on Proposition 2 and Theorem 6 given below. We will use the following construction.
For a fixed digraph Γ, let us choose an arbitrary ε > 0 such that
where L = (ℓ ij ) is the Laplacian matrix of Γ, whose diagonal entries are always non-negative (see ( 2)). It is easy to see that the matrix
(not to be confused with the matrix P of Section 5.2) is row stochastic: 0 ≤ p ij ≤ 1 and
Denote by Γ a weighted multidigraph with loops whose matrix of total arc weights is
Γ can be constructed as follows: every vertex i of Γ gets a loop with weight (1 + ε) -1 p ii ; the remaining arcs of Γ are the same as in Γ, their weights being equal to the corresponding weights in Γ multiplied by (1 + ε) -1 ε.
Recall that a v 0 → v k route (also called a walk ) in a multidigraph with loops is an arbitrary alternating sequence of vertices and arcs v 0 , a 1 , v 1 , . . . , a k , v k where each a i is a v i-1 → v i arc. The length of a route is the number k of its arcs (including loops). The weight of a route is the product of the k weights of its arcs (including repeated arcs). By definition, for every vertex v 0 , there is a v 0 → v 0 route v 0 with length 0 and weight 1.
Let r ij be the weight of the set R ij of all i → j routes in Γ , provided that this weight is finite (note that in the presence of loops R ij is infinite whenever j is reachable from i). R = (r ij ) n×n will denote the matrix of the route weights. Proposition 2. For any digraph Γ and any ε > 0 that satisfies (19), the matrix R of the route weights in Γ is finite and positively proportional to the matrix F of in-forests of Γ.
Proof. By (21), for each k = 0, 1, 2, . . . , the matrix of the weights of k-length routes in Γ is ((1 + ε) -1 P ) k . Therefore, the matrix R, whenever it exists, can be represented as follows:
Since the spectral radius of P is 1 and 0 < (1 + ε) -1 < 1, the series in (22) converges to a finite matrix2 , therefore ( 22), (20), (3), and (4) imply
which completes the proof.
(cf. ( 2)-( 3)). By the matrix forest theorem, Q ′ does exist and q ′ ij = f ′ ij /f ′ , i, j = 1, . . . , n, where f ′ is the total weight of the out-forests in Γ (f ′ = w( F •→ )) and f ′ ij the total weight of out-forests having j in a weak component rooted at i (f ′ ij = w( F i•→j )). From these definitions it follows that F ′ is the transposed matrix F of the reverse digraph Γ -1 . Therefore, by Theorem 5, F ′ determines a transitional measure for Γ and, in view of Theorem 2, the corresponding logarithmic distance is graph-geodetic for Γ. It is worth noting that the logarithmic distances produced by F and F ′ are generally different.
Finally, we touch upon the case of undirected graphs. This case is also considered in [2].
For undirected multigraphs, the definitions of transitional measure and logarithmic distance are completely similar to Definitions 1 and 2, and the above theorems have undirected counterparts. In this section, we present the least obvious result of this kind, which concerns spanning forests.
Corollary 3 (of Theorem 5). Let G be a connected weighted undirected multigraph and let f ij , i, j ∈ V (G), be the total weight of the spanning rooted forests of G that have vertex i belonging to a tree rooted at j. Then:
1. The matrix F = (f ij ) determines a transitional measure for G; 2. The logarithmic distance corresponding to F = (f ij ) is graph-geodetic for G.
Proof. 1. Consider the symmetric multidigraph Γ obtained from G by replacing every edge by two opposite arcs carrying the weight of that edge. Then comparing the matrix forest theorems for directed and undirected graphs [6] yields f ij (G) = f ij (Γ), i, j ∈ V (G). Observe that for every i, j, k ∈ V (G), every path from i to k contains j if and only if so does every directed path from i to k in Γ. Therefore, by virtue of Theorem 5, F = (f ij ) determines a transitional measure for G. Item 2 follows from item 1 of Corollary 2.
Inequality (8) also holds for every matrix S that, with no relation to graphs, obeys the strengthened transition inequality, which is (5) turning into the strict form whenever k = i and j = i. It follows from the proof of Theorem 1 that if such a matrix has positive off-diagonal entries, then it produces a distance by means of (9) and(10).
On counting routes, see also[11,8]. Related finite topological representations that involve paths are obtained in[15]. For some connections with matroid theory, we refer to[16].
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment