Influence maximization in social networks plays a vital role in applications such as viral marketing, epidemiology, product recommendation, opinion mining, and counter-terrorism. A common approach identifies seed nodes by first detecting disjoint communities and subsequently selecting representative nodes from these communities. However, whether the quality of detected communities consistently affects the spread of influence under the Independent Cascade model remains unclear. This paper addresses this question by extending a previously proposed disjoint community detection method, termed $α$-Hierarchical Clustering, to the influence maximization problem under the Independent Cascade model. The proposed method is compared with an alternative approach that employs the same seed selection criteria but relies on communities of lower quality obtained through standard Hierarchical Clustering. The former is referred to as Hierarchical Clustering-based Influence Maximization, while the latter, which leverages higher-quality community structures to guide seed selection, is termed $α$-Hierarchical Clustering-based Influence Maximization. Extensive experiments are performed on multiple real-world datasets to assess the effectiveness of both methods. The results demonstrate that higher-quality community structures substantially improve information diffusion under the Independent Cascade model, particularly when the propagation probability is low. These findings underscore the critical importance of community quality in guiding effective seed selection for influence maximization in complex networks.
Influence in social networks has gained significant attention in recent years [1]. This growing interest has motivated extensive research aimed at understanding how influence propagates, including the identification of key structural properties such as opinion leaders and communities of interest. To address these challenges, researchers have applied various data mining techniques, including community detection and Influence Maximization (IM). The concept of IM was first introduced by Domingos et al. [2] in the context of viral marketing. Later, Kempe et al. [3] formalized IM as an optimization problem. Given a positive integer k, the goal is to identify a seed set S of k influential nodes that maximizes the expected spread of influence. Kempe et al. also demonstrated that the IM problem is N P-hard and proposed two widely adopted diffusion models: the Linear Threshold Model (LTM) and the Independent Cascade Model (IC). In the IC model, influence spreads over a directed graph G = (V, E) where each edge (u, v) is associated with an activation probability p uv . Starting from an initial seed set S, the diffusion unfolds in discrete time steps. At each step t i (i ≥ 0), newly activated nodes are each given one chance to independently activate their currently inactive neighbors. A node u attempts to activate a neighbor v with probability p uv ; if successful, v becomes active at time t i+1 . Regardless of success or failure, u cannot attempt to activate v again. The process terminates when no more activations are possible [4]. Despite its importance, IM remains challenging due to both computational complexity and the difficulty of selecting high-quality seed nodes. Kempe et al. [3] proposed the first Greedy algorithm for IM, which iteratively adds the node that provides the largest marginal gain in influence spread. They proved that this algorithm achieves a (1 -1 e ) ≈ 0.63 approximation of the optimal solution. However, the Greedy method is computationally demanding because it repeatedly simulates the diffusion process. To improve efficiency, Leskovec et al. [5] introduced the Cost-Effective Lazy Forward (CELF) algorithm, which accelerates the Greedy procedure by a factor of about 700. Subsequently, Goyal et al. [6] proposed CELF++, which further improves runtime performance. Borgs et al. [7] then introduced the Reverse Influence Sampling (RIS) framework, providing a more scalable alternative to Greedy-based methods. Beyond these approaches, several IM methods rely on community detection. In social networks, a community is a group of users who share common interests or characteristics. In graph theory, it refers to a set of vertices that are densely connected internally but sparsely connected to the rest of the graph. Community-based IM methods exploit these structural properties to guide seed selection. For example, Bouyer et al. [8] proposed a method using overlapping communities to select influential overlapping nodes. Chen et al. [9] introduced the Community-based Influence Maximization (CIM) framework, which uses Hierarchical Clustering (HC) under the Heat Diffusion Model (HDM) [10]. Numerous recent influence maximization approaches rely on community detection as a foundation, including [11][12][13][14]. Recently, I improved the HC method by proposing a new similarity measure called α-Structural Similarity (α-2S), which accounts for both common neighbors and the interconnection between them. This led to the development of the α-Hierarchical Clustering (α-HC) algorithm, which replaces the traditional Structural Similarity (2S) measure with α-2S [15]. However, it remained unclear whether the higher-quality community structures produced by α-HC consistently affect the spread of influence and lead to more effective seed selection under the Independent Cascade (IC) model [3]. To address this question, I extend the α-HC framework to the influence maximization problem and introduce two methods for comparison: Hierarchical Clustering-based Influence Maximization (HCIM), which relies on standard HC with lower-quality communities, and α-Hierarchical Clustering-based Influence Maximization (α-HCIM), which leverages the higher-quality community structures detected by α-HC to guide seed selection.
The remainder of the paper is structured as follows. Section 2 introduces the formal notations. Section 3 presents the proposed α-HCIM approach. Section 4 describes my experiments on real-world social networks. Finally, Section 5 concludes the paper and outlines future research directions.
Consider an undirected graph G = (V, E), where V is the set of vertices, and E is the set of edges. The set of neighbors of a node u ∈ V is defined as
In this section, α-HC method [15] is extended for the influence maximization problem, referred to α-HCIM. Overlapping nodes can be added in the seed set because they can link several communities in such a way that it contributes to the effective propagation of information. However, not al
This content is AI-processed based on open access ArXiv data.