MISApp: Multi-Hop Intent-Aware Session Graph Learning for Next App Prediction
Predicting the next mobile app a user will launch is essential for proactive mobile services. Yet accurate prediction remains challenging in real-world settings, where user intent can shift rapidly within short sessions and user-specific historical profiles are often sparse or unavailable, especially under cold-start conditions. Existing approaches mainly model app usage as sequential behavior or local session transitions, limiting their ability to capture higher-order structural dependencies and evolving session intent. To address this issue, we propose MISApp, a profile-free framework for next app prediction based on multi-hop session graph learning. MISApp constructs multi-hop session graphs to capture transition dependencies at different structural ranges, learns session representations through lightweight graph propagation, incorporates temporal and spatial context to characterize session conditions, and captures intent evolution from recent interactions. Experiments on two real-world app usage datasets show that MISApp consistently outperforms competitive baselines under both standard and cold-start settings, while maintaining a favorable balance between predictive accuracy and practical efficiency. Further analyses show that the learned hop-level attention weights align well with structural relevance, offering interpretable evidence for the effectiveness of the proposed multi-hop modeling strategy.
💡 Research Summary
The paper introduces MISApp, a profile‑free framework for next‑app prediction that leverages multi‑hop session graphs to capture both immediate and higher‑order transition dependencies within a user’s current session. After segmenting raw app‑usage logs into sessions based on a 5‑minute inactivity threshold, the method constructs three directed graphs per session: 1‑hop (direct consecutive transitions), 2‑hop (two‑step indirect transitions), and 3‑hop (three‑step indirect transitions). Each graph is processed independently with a LightGCN‑style propagation that aggregates neighbor embeddings without feature transformation, preserving structural information while remaining computationally lightweight. After L propagation layers, node representations are averaged, and an attention mechanism extracts a graph‑level embedding for each hop.
A short‑term “immediate intent” vector is formed by concatenating embeddings of the last K apps (e.g., K = 5) and applying a linear projection. Hop‑level intent attention then computes softmax‑normalized similarity scores between this intent vector and each hop’s graph embedding, yielding weights that dynamically balance the contribution of each structural range. The weighted sum of hop embeddings forms the comprehensive session representation.
Temporal context (hour‑of‑day) and spatial context (base‑station‑derived POI features) are embedded separately and fused with the app embeddings via a Cross‑Modal Gated Fusion (CMGF) module, which learns sigmoid gates to modulate each modality’s influence. The fused representation, together with the immediate intent vector, is fed into a Transformer encoder‑decoder that models sequential intent evolution and finally predicts the probability distribution over the next app.
Experiments on two real‑world datasets (one from Tsinghua University, China, and another from Singapore) demonstrate that MISApp consistently outperforms strong baselines—including GRU‑based RNNs, Transformer‑based sequence models, and graph‑based methods such as SA‑GCN and DUGN—both in standard settings and under cold‑start conditions where each user has ≤5 historical interactions. Gains range from 4–7 percentage points in Top‑K accuracy in the full‑data regime to 10–12 points in cold‑start scenarios. Inference latency remains low (≈1–2 ms per prediction), supporting real‑time deployment.
Ablation studies confirm the importance of multi‑hop graphs, hop‑level attention, and contextual embeddings. Visualization of hop attention weights shows that 1‑hop dominates when immediate transitions are strong, while higher hops receive higher weight when longer‑range routines or cross‑category patterns emerge, providing interpretability.
Limitations include potential sparsity of higher‑hop edges in very short sessions and reliance on POI data that may not generalize across regions. Future work could explore dynamic graph updates, richer multimodal contexts, and hybrid models that combine session‑level graph signals with long‑term user profiles. Overall, MISApp offers a compelling solution for accurate, interpretable, and efficient next‑app prediction, especially in environments where user history is limited.
Comments & Academic Discussion
Loading comments...
Leave a Comment