CommCP: Efficient Multi-Agent Coordination via LLM-Based Communication with Conformal Prediction

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

To complete assignments provided by humans in natural language, robots must interpret commands, generate and answer relevant questions for scene understanding, and manipulate target objects. Real-world deployments often require multiple heterogeneous robots with different manipulation capabilities to handle different assignments cooperatively. Beyond the need for specialized manipulation skills, effective information gathering is important in completing these assignments. To address this component of the problem, we formalize the information-gathering process in a fully cooperative setting as an underexplored multi-agent multi-task Embodied Question Answering (MM-EQA) problem, which is a novel extension of canonical Embodied Question Answering (EQA), where effective communication is crucial for coordinating efforts without redundancy. To address this problem, we propose CommCP, a novel LLM-based decentralized communication framework designed for MM-EQA. Our framework employs conformal prediction to calibrate the generated messages, thereby minimizing receiver distractions and enhancing communication reliability. To evaluate our framework, we introduce an MM-EQA benchmark featuring diverse, photo-realistic household scenarios with embodied questions. Experimental results demonstrate that CommCP significantly enhances the task success rate and exploration efficiency over baselines. The experiment videos, code, and dataset are available on our project website: https://comm-cp.github.io.

💡 Research Summary

The paper introduces a novel multi‑agent, multi‑task embodied question answering problem (MM‑EQA), where several heterogeneous service robots operate in a shared 3‑D household environment, each tasked with answering natural‑language questions that require visual perception, reasoning, and manipulation. Unlike prior single‑agent EQA work, MM‑EQA demands coordinated information gathering: robots must exchange observations and answers to avoid redundant exploration and to accelerate task completion. The authors argue that natural language is an ideal communication protocol because large language models (LLMs) are already trained for dialogue, but raw LLM outputs are often mis‑calibrated and over‑confident, leading to irrelevant or misleading messages that can degrade cooperation.

To address this, the authors propose CommCP, a decentralized communication framework that couples LLM‑generated messages with conformal prediction (CP) to statistically calibrate the confidence of each message. The system consists of four modules per robot: (1) Perception, where a visual‑language model extracts a set of observed objects and their attributes from RGB‑Depth images; (2) Communication, where each robot prompts an LLM with its observations and a partner’s target request, using a zero‑shot chain‑of‑thought prompt that yields four categorical options (A: the observed object is exactly the target, B: it is highly relevant, C/D: irrelevant). The LLM returns probabilities (p_k) for each option; (3) Confidence Check, which applies split‑conformal prediction separately to options A and B. Calibration sets are built from 20 diverse HM3D scenes with ground‑truth (observed, target) pairs, allowing the system to compute quantile‑based thresholds that guarantee a user‑specified coverage (e.g., 90%). Only options whose probability exceeds the threshold are included in a prediction set that is sent to the partner; C/D are discarded. (4) Planning, where the received prediction sets are projected onto a 2‑D weighted semantic value map that guides navigation toward promising regions while avoiding already‑explored or irrelevant areas.

The authors also construct an MM‑EQA benchmark using photorealistic HM3D indoor scenes. Three robots with distinct capabilities (e.g., cleaning, object transport, light control) are placed in each scene and assigned multiple‑choice questions (four answer options). Baselines include (i) independent single‑agent EQA, (ii) LLM‑based communication without any calibration, and (iii) rule‑based message passing. Experiments show that CommCP improves overall task success rate from 78 % to 92 % (a 14‑point gain) and reduces average exploration time by roughly 31 % (45 s → 31 s). Moreover, the number of transmitted messages drops by more than half, indicating that calibrated communication reduces bandwidth usage and prevents “information overload.” Ablation studies confirm that removing CP leads to a steep drop in success (≈15 % points) and a substantial increase in erroneous messages, while treating options A and B with a single calibration threshold inflates message volume.

Key insights from the work are: (1) Statistical calibration of LLM outputs is essential for reliable multi‑robot dialogue, especially when decisions are made based on communicated information; (2) Conformal prediction provides finite‑sample, distribution‑free guarantees that can be directly integrated into a language‑driven pipeline; (3) Selective sharing of only high‑confidence, task‑relevant observations dramatically improves exploration efficiency in embodied settings.

Limitations include reliance on simulated calibration data, which may not fully capture the domain shift encountered on physical robots, and a relatively simple 2‑D planning layer that could be extended to richer 3‑D path‑planning or multi‑objective optimization. Future work may explore online calibration, adaptive confidence thresholds, and integration with human‑in‑the‑loop interfaces.

In summary, CommCP demonstrates that coupling LLM‑based natural‑language communication with conformal prediction yields a robust, efficient, and scalable framework for cooperative embodied AI, setting a new benchmark for multi‑robot question answering and opening avenues for trustworthy language‑grounded robot collaboration.

CommCP: Efficient Multi-Agent Coordination via LLM-Based Communication with Conformal Prediction

💡 Research Summary

Comments & Academic Discussion

Leave a Comment