Llama-Mob: Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Human mobility prediction plays a critical role in applications such as disaster response, urban planning, and epidemic forecasting. Traditional methods often rely on designing crafted, domain-specific models, and typically focus on short-term predictions, which struggle to generalize across diverse urban environments. In this study, we introduce Llama3-8B-Mob, a large language model fine-tuned with instruction tuning, for long-term citywide mobility prediction–in a Q&A manner. We validate our approach using large-scale human mobility data from four metropolitan areas in Japan, focusing on predicting individual trajectories over the next 15 days. The results demonstrate that Llama3-8B-Mob excels in modeling long-term human mobility–surpassing the state-of-the-art on multiple prediction metrics. It also displays strong zero-shot generalization capabilities–effectively generalizing to other cities even when fine-tuned only on limited samples from a single city. Moreover, our method is general and can be readily extended to the next POI prediction task. For brevity, we refer to our model as Llama-Mob, and the corresponding results are included in this paper. Source codes are available at https://github.com/TANGHULU6/Llama3-8B-Mob.

💡 Research Summary

This paper introduces Llama‑Mob, an instruction‑tuned version of the open‑source Llama‑3‑8B language model, for long‑term, city‑scale human mobility prediction. The authors reframe the trajectory forecasting task as a question‑answer (Q&A) problem: an instruction block defines the model’s role, the target city’s grid layout, trajectory format, and required JSON output; a question block supplies the user’s historical trajectory with masked locations (denoted as 999,999); and the answer block contains the predicted coordinates. By converting the spatio‑temporal prediction into a natural‑language prompting task, the model can leverage its built‑in world knowledge and reasoning abilities.

Training data come from the Human Mobility Challenge 2024, which provides 75‑day check‑in logs for four Japanese metropolitan areas (cities A‑D). Each record is discretized onto a 200 × 200 grid (500 m cells) and 30‑minute time slots. The authors construct a fine‑tuning corpus by sampling users from each city, formatting each sample according to the Q&A template, and applying Low‑Rank Adaptation (LoRA) adapters (rank = 16) to the key, query, value, and output projections of the transformer. This parameter‑efficient approach reduces trainable parameters to ~42 M while preserving the 8 B‑parameter backbone. Training uses token‑level cross‑entropy, a batch size of 1 with gradient accumulation over four steps, a learning rate of 2e‑3, three epochs, and a cosine scheduler. The model is quantized to 4‑bit to cut GPU memory usage by 70 %.

For baselines, the authors adopt LP‑Bert, the champion model of the 2023 Human Mobility Prediction Challenge, which treats each spatio‑temporal record as a token and adds a city embedding to enable multi‑city prediction. Evaluation metrics are Dynamic Time Warping (DTW) for shape similarity and GEO‑BLEU, a spatially aware n‑gram metric. Because inference with a large LLM is costly (≈5 minutes per trajectory), the validation set is limited to 100 randomly selected users per city.

Results show that Llama‑Mob fine‑tuned on a single city already outperforms LP‑Bert across all cities. For example, when fine‑tuned on city B, the model achieves an average DTW of 26.32 (lower is better) and GEO‑BLEU of 0.3322 (higher is better), with a mean rank of 2.5 versus LP‑Bert’s rank of 4.17. Combining data from cities A and B yields the best overall performance (DTW = 25.39, GEO‑BLEU = 0.3541). These findings indicate that LLMs possess a strong inherent understanding of human mobility patterns that transfers across urban contexts without city‑specific retraining.

However, the efficiency analysis reveals substantial overhead. Training Llama‑Mob (A + B) takes 6.64 days—2.4 × longer than LP‑Bert—while inference per trajectory averages 225.6 seconds, roughly 16,000 × slower. The auto‑regressive decoding causes inference time to scale linearly with trajectory length, with the longest cases exceeding 15 minutes. The authors acknowledge this as a major barrier to real‑time deployment and suggest future work on faster decoding, model compression, and multi‑modal extensions (e.g., integrating maps, traffic, weather).

A case study on a city‑B user visualizes the historical 60‑day trajectory and the predictions from both models. Llama‑Mob captures smoother, more realistic movement patterns, especially for longer trips and weekend variations, whereas LP‑Bert shows more abrupt jumps.

The paper also notes that the same instruction‑tuning framework was applied to a next‑POI prediction task, where Llama‑Mob achieved top rankings in the 2024 Human Mobility Prediction Challenge using only 16 % of the training data.

In summary, Llama‑Mob demonstrates that a modest amount of domain‑specific instruction fine‑tuning can transform a general‑purpose LLM into a state‑of‑the‑art mobility predictor, achieving superior long‑term accuracy and zero‑shot cross‑city generalization. While computational cost remains a challenge, the approach opens a promising avenue for leveraging LLMs in urban analytics, disaster response, and transportation planning.

Llama-Mob: Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction

💡 Research Summary

Comments & Academic Discussion

Leave a Comment