$k^{tau,epsilon}$-anonymity: Towards Privacy-Preserving Publishing of Spatiotemporal Trajectory Data
Mobile network operators can track subscribers via passive or active monitoring of device locations. The recorded trajectories offer an unprecedented outlook on the activities of large user populations, which enables developing new networking solutions and services, and scaling up studies across research disciplines. Yet, the disclosure of individual trajectories raises significant privacy concerns: thus, these data are often protected by restrictive non-disclosure agreements that limit their availability and impede potential usages. In this paper, we contribute to the development of technical solutions to the problem of privacy-preserving publishing of spatiotemporal trajectories of mobile subscribers. We propose an algorithm that generalizes the data so that they satisfy $k^{\tau,\epsilon}$-anonymity, an original privacy criterion that thwarts attacks on trajectories. Evaluations with real-world datasets demonstrate that our algorithm attains its objective while retaining a substantial level of accuracy in the data. Our work is a step forward in the direction of open, privacy-preserving datasets of spatiotemporal trajectories.
💡 Research Summary
The paper tackles the problem of publishing mobile‑operator collected spatiotemporal trajectories in a way that protects subscriber privacy while preserving data utility. Existing privacy mechanisms such as simple pseudonymisation, classic k‑anonymity, or differential privacy are ill‑suited for this domain because operator data are irregularly sampled, span long periods, and contain highly distinctive movement patterns that make individuals uniquely identifiable even after coarse aggregation.
To address these challenges the authors first formalise two realistic attack models. The first, a “record‑linkage” attack, assumes an adversary can continuously observe a target subscriber for any time interval of length τ and thus knows the exact sequence of spatiotemporal samples within that window. The second, a “probabilistic” attack, assumes the adversary may already possess such a τ‑window and can additionally learn a further disjoint interval of length ε, thereby extending knowledge of the user’s trajectory. Both attacks exploit the inherent uniqueness of mobility traces.
From this threat model the authors introduce a novel privacy criterion called kτ,ε‑anonymity. The criterion enforces two conditions: (i) for any τ‑length window, each subscriber’s trajectory segment must be indistinguishable from at least k‑1 other subscribers (k‑anonymity on the sliding window), and (ii) the extra information that can be inferred outside the τ‑window is bounded by a small leakage parameter ε. In other words, even if an attacker knows a continuous τ‑segment, they cannot pinpoint the user among fewer than k candidates, and any additional knowledge they can gain is limited to a short ε‑segment. When τ + ε covers the whole dataset, kτ,ε‑anonymity collapses to ordinary k‑anonymity, showing that the new definition subsumes the classic one while adding protection against probabilistic attacks.
The core technical contribution is an efficient algorithmic framework that enforces kτ,ε‑anonymity with minimal loss of spatial and temporal granularity. The authors define a generalisation cost for merging raw samples into a broader spatiotemporal slot: the cost is the product of temporal span and spatial area expansion. Using this cost they devise k‑merge, an optimal low‑complexity procedure that merges k (sub‑)trajectories into a single generalized trajectory while minimising the total cost. The algorithm works by constructing a bipartite matching between samples of different users and solving it with a greedy or Hungarian‑style method, guaranteeing that the merged segment is the cheapest possible representation that satisfies the k‑overlap requirement.
Building on k‑merge, the paper presents kte‑hide, a practical anonymisation pipeline. kte‑hide slides a window of length τ over each user’s trajectory, repeatedly applies k‑merge to obtain a set of k‑indistinguishable generalized segments, and suppresses any samples that cannot be merged without exceeding the ε‑leakage budget. The process is repeated for all users, ensuring that every τ‑segment in the final dataset participates in a k‑anonymous group and that the total additional exposure (ε) remains bounded.
The authors evaluate their approach on two real‑world mobile operator datasets containing millions of records. They vary k (2–10), τ (30 min to 2 h) and ε (5–30 min) and compare against baseline methods that enforce only k‑anonymity or add Laplacian noise for differential privacy. Metrics include average spatial error, temporal distortion, data loss ratio, and re‑identification success rate under simulated attacks. Results show that kte‑hide achieves the target kτ,ε‑anonymity while keeping average spatial error 30–50 % lower than the k‑only baselines, and reduces re‑identification probability to below 5 % even for modest k values. Moreover, the utility loss remains acceptable for downstream analytics such as traffic flow estimation and hotspot detection.
The paper’s contributions are threefold: (1) a rigorous privacy model tailored to spatiotemporal trajectories that captures both continuous observation and limited leakage attacks, (2) an optimal merging algorithm (k‑merge) that minimises granularity loss while satisfying k‑overlap constraints, and (3) a complete end‑to‑end anonymisation system (kte‑hide) validated on large‑scale real data. Limitations are acknowledged: the choice of τ and ε is data‑dependent and may require domain expertise; very high k values can lead to excessive suppression; and the current formulation handles only two‑dimensional space, leaving extensions to altitude or multimodal sensor data for future work.
In summary, the work advances the state of the art in privacy‑preserving publishing of mobile subscriber trajectories by introducing a nuanced anonymity definition and delivering an efficient, provably optimal anonymisation technique that balances strong privacy guarantees with high data fidelity. This opens the door for more open sharing of valuable mobility datasets while respecting user privacy.
Comments & Academic Discussion
Loading comments...
Leave a Comment