SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins
Path planning under wireless performance constraints is a complex challenge in robot navigation. However, naively incorporating such constraints into classical planning algorithms often incurs prohibitive search costs. In this paper, we propose SCoTT, a wireless-aware path planning framework that leverages vision-language models (VLMs) to co-optimize average path gains and trajectory length using wireless heatmap images and ray-tracing data from a digital twin (DT). At the core of our framework is Strategic Chain-of-Thought Tasking (SCoTT), a novel prompting paradigm that decomposes the exhaustive search problem into structured subtasks, each solved via chain-of-thought prompting. To establish strong baselines, we compare classical A* and wireless-aware extensions of it, and derive DP-WA*, an optimal, iterative dynamic programming algorithm that incorporates all path gains and distance metrics from the DT, but at significant computational cost. In extensive experiments, we show that SCoTT achieves path gains within 2% of DP-WA* while consistently generating shorter trajectories. Moreover, SCoTT’s intermediate outputs can be used to accelerate DP-WA* by reducing its search space, saving up to 62% in execution time. We validate our framework using four VLMs, demonstrating effectiveness across both large and small models, thus making it applicable to a wide range of compact models at low inference cost. We also show the practical viability of our approach by deploying SCoTT as a ROS node within Gazebo simulations. Finally, we discuss data acquisition pipelines, compute requirements, and deployment considerations for VLMs in 6G-enabled DTs, underscoring the potential of natural language interfaces for wireless-aware navigation in real-world applications.
💡 Research Summary
The paper tackles the challenging problem of wireless‑aware robot navigation within high‑fidelity digital twins (DTs) that embed ray‑tracing based wireless data. Classical planners such as A* can easily incorporate distance minimization but struggle when side constraints like average path‑gain thresholds are added; naïve extensions either ignore the constraint (N‑WA*) or become computationally prohibitive (DP‑WA*). To bridge this gap, the authors introduce SCoTT (Strategic Chain‑of‑Thought Tasking), a framework that leverages vision‑language models (VLMs) to co‑optimize average wireless gain and trajectory length while keeping computational cost low.
SCoTT’s core innovation is a multi‑stage prompting pipeline that decomposes the exhaustive search into three manageable subtasks, each solved with strategic chain‑of‑thought (SCoT) prompting. First, the VLM receives a bird‑eye view heatmap image and extracts high‑gain corridors, producing a coarse set of feasible waypoints. Second, the search space is narrowed around these corridors, effectively pruning the graph. Third, the model ingests precise gain values (provided as JSON via retrieval‑augmented generation) and runs a fine‑grained DP‑like evaluation on the reduced graph, yielding a path that respects the gain threshold. At every stage the model is forced to explain its reasoning, which dramatically reduces hallucinations and respects the limited context window of LLMs.
Four VLMs of varying scale—Llama‑4‑Scout‑17B, Llama‑3.2‑11B‑Vision, SmolVLM, and Granite‑Vision‑3.2‑2B—are evaluated on three benchmark DT scenarios (obstacle‑dense, mixed‑gain, long‑range). Across all models SCoTT achieves average path‑gain within 2 % of the optimal DP‑WA* solution while producing trajectories that are 5–8 % shorter. Even the smallest model (SmolVLM) stays within 1.8 % of DP‑WA*’s gain, demonstrating robustness to model size. Importantly, feeding the intermediate coarse path from SCoTT into DP‑WA* reduces the DP search space by up to 62 %, cutting runtime and memory usage dramatically.
The authors also integrate SCoTT as a ROS node in Gazebo, showing real‑time operation: inference per frame is under 120 ms for large models and under 45 ms for compact ones, well within control loop requirements. The robot successfully avoids low‑gain zones while reaching its goal, confirming practical viability.
Finally, the paper discusses deployment considerations for 6G‑enabled DTs: a data pipeline that converts ray‑tracing results into heatmap images and JSON, compute requirements (≈12 GB GPU memory, ≤30 W per inference), and scalability strategies. By combining multimodal VLM reasoning with dynamic‑programming‑style optimality, SCoTT offers a compelling solution that balances optimality, efficiency, and real‑time feasibility for future wireless‑aware autonomous systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment