A Model for Spatial Outlier Detection Based on Weighted Neighborhood Relationship
Spatial outliers are used to discover inconsistent objects producing implicit, hidden, and interesting knowledge, which has an effective role in decision-making process. In this paper, we propose a model to redefine the spatial neighborhood relationship by considering weights of the most effective parameters of neighboring objects in a given spatial data set. The spatial parameters, which are taken into our consideration, are distance, cost, and number of direct connections between neighboring objects. This model is adaptable to be applied on polygonal objects. The proposed model is applied to a GIS system supporting literacy project in Fayoum governorate.
💡 Research Summary
The paper addresses a fundamental limitation in existing spatial outlier detection techniques: the assumption that all neighboring objects exert equal influence on a target object. To overcome this, the authors propose a new definition of spatial neighborhood that incorporates a weight parameter reflecting the strength of each neighbor’s effect. Three spatial factors are considered—geographic distance (D), the number of direct connections (R), and the minimal travel cost (C). Each factor is transformed into a normalized weight: distance contributes inversely (1/D), connections contribute directly (R), and cost contributes inversely (1/C). User‑defined coefficients α, β, and δ (summing to 1) control the relative importance of these factors, allowing the model to be tuned for different applications.
Mathematically, the classic expectation formula E(r)=∑F(i)/N is replaced by a weighted expectation E(r)=∑Wri F(i), where the weights Wri satisfy ∑Wri=1. The weight calculation is expressed in a series of equations (3)–(6), culminating in a combined formula that merges distance, connectivity, and cost. This formulation eliminates the traditional separation between distance‑based and graph‑based outlier methods, enabling a unified treatment of both.
The authors adapt Shekhar’s spatial outlier framework to incorporate weighted neighborhoods. They define a spatial framework SF=(S, NB) with a weighted neighbor relation NB(S, S, W). An object is declared an outlier if the absolute Z‑score |S(x)−μS(x)|/σS(x) exceeds a threshold Θ (e.g., Θ=2 for 95 % confidence), where S(x) is the difference between the object’s attribute value and the weighted average of its neighbors.
To demonstrate practicality, the model is applied to a GIS dataset from the Egyptian Center for Women’s Rights’ literacy project in Fayoum governorate. The dataset consists of 167 villages (polygons) grouped around six cities, each with percentages of female illiteracy. Neighbors are defined as adjacent polygons sharing a boundary; distances are measured between polygon centroids, and polygon area is used as an additional factor. For a sample village (ID 27) with seven neighbors, the weighted model assigns about 41 % of total influence to the nearest neighbor (ID 29) and only 5 % to the farthest (ID 42). The classic model would allocate equal weight (~14.3 %) to each neighbor. Consequently, the weighted model predicts a female illiteracy rate of 28 % (close to the actual 26 %), whereas the classic model predicts 45 %, a much larger error.
Performance is evaluated using mean‑square error (MSE). In some cases the weighted model reduces MSE by up to 98.9 %, and on average by about 8 % compared with the classic approach. The set of detected outlier villages also differs: the weighted model identifies additional outliers (IDs 511, 302, 239) that the classic method misses, while the classic method flags villages (IDs 28, 29) that are not outliers under the weighted scheme because their nearest neighbors have similar illiteracy rates.
The paper concludes that incorporating weighted spatial relationships significantly improves outlier detection accuracy and provides a flexible framework adaptable to various spatial contexts. However, it acknowledges open issues such as systematic selection of α, β, δ, handling of zero or near‑zero costs, and more sophisticated neighbor definitions for complex polygon geometries. Future work is suggested to address these challenges and to validate the approach across diverse domains such as transportation, ecology, and location‑based services.
Comments & Academic Discussion
Loading comments...
Leave a Comment