A Well-Behaved Alternative to the Modularity Index

Reading time: 5 minute
...

📝 Original Info

  • Title: A Well-Behaved Alternative to the Modularity Index
  • ArXiv ID: 1108.4658
  • Date: 2011-08-24
  • Authors: Linton C. Freeman

📝 Abstract

This paper reviews the modularity index and suggests an alternative index of the quality of a division of a network into subsets.

💡 Deep Analysis

Figure 1

📄 Full Content

A major thrust in current network research involves the development of algorithms for uncovering clusters in network data. Network data come in the form of a graph G = (V, E) where V is a set of vertices and E is a set of unordered pairs of vertices, called edges. All these algorithms seek to partition the vertices in V into m > 1 subsets within which vertex-vertex connections are dense, but between which such connections are sparse. Social scientists call such subsets groups, but to physicists they are communities and computer scientists label them clusters. But, regardless of the labeling, all these fields are concerned with the same structural form. Here the neutral term cohesive subsets will be used to refer to structures of this sort.

None of these cluster-finding algorithms is guaranteed to divide the vertices into subsets that actually have the properties of cohesive subsets. As Newman and Girvan [1] put it:

Our algorithms always produce some division of the network into communities, even in completely random networks that have no meaningful community structure, so it would be useful to have some way of saying how good the structure found is.

An index of the quality of the results of a cohesive subset-finding algorithm must take two properties of the subset into account:

(1) The frequency of external edges, those that link pairs of vertices that fall in different subsets.

(2) The frequency of internal edges, those that link pairs of vertices both of which fall within a subset. Such an index should decline in the face of an increasing number of external edges and it should grow in the presence of an increasing number of internal edges. Moreover, it should not be affected by extraneous factors that do not bear directly on these two kinds of edges.

A widely used index of “the quality of a particular division of a network” is Newman and Girvan’s modularity, Q [1]. Q is focused on the ties that fall within subsets. Let

Here e i is the fraction of within-subset edges in the network. And E(d i ) 2 is the expected fraction of withinsubset edges under the assumption that the edges are generated at random, conditional upon maintaining the observed degree distribution of the vertices in G.

With respect to external edge frequencies Q performs properly. To see this consider the two-cluster partitioning shown in Figure 1. It is an example of a perfect case of the cohesive subsets we are seeking. No external edges link the two clusters, and all possible internal edges are present within each cluster. For the subsets shown in Figure 1, Q yields a value of 0.50.

Now consider the two cohesive subsets shown in Figure 2. There, a new external edge is displayed-one that crosses between the two clusters. In that case, Q is reduced to 0.468. And that pattern of reduction continues as more external edges are added. Figure 3 shows the decline in values of Q as additional cross-cutting external edges are introduced. Q, however, does not perform as well with respect to several other structural properties [2,3]. Consider first, internal edges. In the perfect cohesive subsets shown in Figure 1, Q = 0.50. In that partitioning all the possible internal edges in each cluster are present. Now suppose we remove one edge from each of the clusters as shown in Figure 4. Ideally, we should expect the value of Q to decline. However, the value of Q for the partitioning shown in Figure 4 is still 0.50. Removing those edges did not decrease its value. Moreover, if we continue to remove edges, systematically, one from each cluster, Q continues to produce a value of 0.50 even when there remains only one edge on each side of the split. Although cohesive subset-finding seeks partitions where there are many internal edges, Q turns out to be indifferent to the presence or absence of internal edges.

In addition, the values produced by Q are also confounded by the effects of two extraneous factors, subset numbers and edge inequalities. The number of cohesive subsets that are being evaluated constrains the range of values that Q can produce. Where m is the number of cohesive subsets presented to Q, the upper limit that Q can reach is (m -1)/m. That is why the perfect partitioning displayed in Figure 1 only produces a value of Q = 0.50. In contrast, if instead of the two cohesive subsets displayed in that image, there were 100 identical cohesive subsets, then Q would take a value of 0.99. In both cases, however, the partitioning is equally flawless.

The impact of inequalities in the numbers of edges contained in the subsets is equally confounding. The value produced by Q is reduced to the degree that the clusters with which it is confronted contain differing numbers of edges. Figure 1 and Figure 5, for example, both display perfect partitionings. Both involve graphs that display two cohesive subsets that contain all internal edges and no external edges. But while the data of Figure 1 produce Q = 0.50, for the data of Figure 5, Q is reduced

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut