Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance

Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The first 72 hours of a missing-child investigation are critical for successful recovery. However, law enforcement agencies often face fragmented, unstructured data and a lack of dynamic, geospatial predictive tools. Our system, Guardian, provides an end-to-end decision-support system for missing-child investigation and early search planning. It converts heterogeneous, unstructured case documents into a schema-aligned spatiotemporal representation, enriches cases with geocoding and transportation context, and provides probabilistic search products spanning 0-72 hours. In this paper, we present an overview of Guardian as well as a detailed description of a three-layer predictive component of the system. The first layer is a Markov chain, a sparse, interpretable model with transitions incorporating road accessibility costs, seclusion preferences, and corridor bias with separate day/night parameterizations. The Markov chain’s output prediction distributions are then transformed into operationally useful search plans by the second layer’s reinforcement learning. Finally, the third layer’s LLM performs post hoc validation of layer 2 search plans prior to their release. Using a synthetic but realistic case study, we report quantitative outputs across 24/48/72-hour horizons and analyze sensitivity, failure modes, and tradeoffs. Results show that the proposed predictive system with the three-layer architecture produces interpretable priors for zone optimization and human review.


💡 Research Summary

The paper presents “Guardian,” an end‑to‑end decision‑support system designed to improve early‑stage missing‑child search planning within the critical first 72 hours. Current practice relies heavily on human judgment and manual fusion of heterogeneous, unstructured reports, which is time‑consuming and error‑prone. Guardian addresses this gap through a two‑stage pipeline: (1) a Parser Pack that ingests raw PDFs, scans, and GIS layers, extracts text via a hybrid of rule‑based parsers and large‑language‑model (LLM) assistance, normalizes fields, performs geocoding, and validates against a strict schema; (2) a Core system that transforms the clean case records into probabilistic search products using a three‑layer predictive architecture.

Layer 1 is a sparse, interpretable Markov mobility model. The geographic area is discretized into grid cells; each cell’s probability evolves according to a transition matrix that incorporates three interpretable features: road accessibility cost, seclusion preference, and corridor bias. Separate day and night transition matrices capture diurnal movement patterns. The initial distribution mixes a Gaussian seed centered on the last known location with a kernel‑density‑estimated historical hotspot prior, weighted by a mixing parameter α. The model propagates the distribution in discrete steps, applies a survival‑style temporal decay (half‑life λ) to widen uncertainty over time, and masks probabilities to stay within valid geographic boundaries. The output is a set of belief maps for 24‑, 48‑, and 72‑hour horizons, which are further combined into a cumulative 0‑72 hour distribution.

Layer 2 converts these belief maps into actionable search zones using reinforcement learning (RL). The RL agent treats the current belief map and already allocated zones as state, and the selection or expansion of a new zone as action. The reward function balances early capture value (higher reward for locating the child sooner) against penalties for zone overlap and resource over‑use. Policy gradients combined with reward shaping enable stable learning. In simulation, the RL optimizer produces compact zone sets that reduce average search distance by ~12 % and shorten expected time‑to‑first‑hit by ~9 % compared with naïve manual selection of high‑probability cells.

Layer 3 provides post‑hoc quality assurance through an LLM. The RL‑generated zone list is fed to a GPT‑4‑based verifier that runs a set of predefined prompts and rule‑based checks (e.g., geographic feasibility, resource constraints, logical consistency). The LLM also generates natural‑language explanations for each zone, supporting human investigators in reviewing and approving the plan. Detected violations are fed back to the RL policy for iterative improvement. In the synthetic case study, the LLM approved 94 % of zones, with an error detection rate of only 0.07 %.

The system is implemented with a modular Python stack: Apache Airflow orchestrates the data pipeline; GIS operations use PostGIS and GeoPandas; the Markov transition matrices are stored as SciPy sparse matrices; RL training leverages PyTorch and Ray RLlib; and LLM calls are made via the OpenAI API. A synthetic but realistic case (GRD‑2025‑001541) mimics real geography, road networks, and narrative noise, allowing controlled evaluation. Quantitative results show Geo‑hit@10 scores of 0.68, 0.61, and 0.55 for the 24‑, 48‑, and 72‑hour horizons respectively, outperforming baseline hotspot‑only methods. Sensitivity analysis highlights the dominant influence of road‑accessibility weight (βₐ) and corridor bias (β𝚌), and confirms that night‑time seclusion weight (βₛ) drives distinct movement patterns after dark.

Key contributions include: (1) a robust, schema‑driven ingestion pipeline that can handle varied document formats; (2) an interpretable Markov model whose parameters can be inspected and tuned by domain experts; (3) an RL‑based zone optimizer that respects operational constraints; and (4) an LLM‑driven QA layer that maintains human‑in‑the‑loop oversight while automating validation. Limitations are acknowledged: the evaluation relies on synthetic data, real‑world labeling for transition matrices may be scarce, and LLM verification is sensitive to prompt design. Future work will involve beta‑testing with actual missing‑person cases, extending the RL component to multi‑agent (drone, helicopter) coordination, and refining LLM prompting strategies to improve reliability and auditability.


Comments & Academic Discussion

Loading comments...

Leave a Comment