Clarify Before You Draw: Proactive Agents for Robust Text-to-CAD Generation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large language models have recently enabled text-to-CAD systems that synthesize parametric CAD programs (e.g., CadQuery) from natural language prompts. In practice, however, geometric descriptions can be under-specified or internally inconsistent: critical dimensions may be missing and constraints may conflict. Existing fine-tuned models tend to reactively follow user instructions and hallucinate dimensions when the text is ambiguous. To address this, we propose a proactive agentic framework for text-to-CadQuery generation, named ProCAD, that resolves specification issues before code synthesis. Our framework pairs a proactive clarifying agent, which audits the prompt and asks targeted clarification questions only when necessary to produce a self-consistent specification, with a CAD coding agent that translates the specification into an executable CadQuery program. We fine-tune the coding agent on a curated high-quality text-to-CadQuery dataset and train the clarifying agent via agentic SFT on clarification trajectories. Experiments show that proactive clarification significantly improves robustness to ambiguous prompts while keeping interaction overhead low. ProCAD outperforms frontier closed-source models, including Claude Sonnet 4.5, reducing the mean Chamfer distance by 79.9 percent and lowering the invalidity ratio from 4.8 percent to 0.9 percent. Our code and datasets will be made publicly available.

💡 Research Summary

Title: Clarify Before You Draw: Proactive Agents for Robust Text‑to‑CAD Generation

Abstract‑level Summary
The paper introduces ProCAD, a two‑agent system that first clarifies ambiguous or inconsistent natural‑language design specifications before synthesizing parametric CAD code. A proactive clarifying agent audits the user prompt, asks minimal targeted questions, and produces a self‑consistent textual specification. A second coding agent then translates this specification into an executable CadQuery program. The authors construct a high‑quality text‑to‑CadQuery dataset (≈10 K samples) using a vision‑language model and rigorous leak‑ and completeness‑checks, and a synthetic ambiguous‑prompt dataset (≈6 K samples) for training the clarifier via agentic supervised fine‑tuning (SFT). Experiments show that ProCAD reduces mean Chamfer distance by 79.9 % and the invalidity ratio from 4.8 % to 0.9 % compared with state‑of‑the‑art closed‑source models (Claude Sonnet 4.5, GPT‑4o‑mini), while keeping interaction overhead low (average <2 clarification rounds).

Technical Contributions

Problem Re‑framing: The authors argue that treating text‑to‑CAD as a single translation task is insufficient because real‑world design prompts are often under‑specified or internally contradictory.
Agentic Framework: ProCAD decomposes the task into (i) ambiguity detection & clarification, (ii) specification consolidation, and (iii) code generation. The clarifying agent is modeled as a finite‑horizon Markov Decision Process (MDP) with actions {ACCEPT, ASK(question)}. The reward balances geometric fidelity (Chamfer distance) against communication cost (λ·C(h)).
Dataset Construction: Starting from raw CadQuery programs (DeepCAD), the authors render each model from four viewpoints and prompt GPT‑5‑mini (a frontier VLM) to generate natural‑language descriptions conditioned on both images and code. They apply (a) code‑leak detection (LLM‑based regex‑free check) and (b) completeness verification (regenerate CadQuery, compute Chamfer distance). A retry loop (up to three attempts) yields >80 % automatic acceptance, with remaining cases manually reviewed. This pipeline produces a curated 10 K high‑quality text‑to‑CadQuery corpus.
Synthetic Ambiguity Generation: By perturbing verified specifications (e.g., removing dimensions, introducing contradictory constraints), the authors create 6 063 ambiguous prompts paired with full clarification trajectories (questions + simulated user answers). These trajectories train the clarifier via agentic SFT.
Model Fine‑Tuning: Both agents are built on Qwen2.5‑7B‑Instruct. The coding agent (ProCAD‑coder) is fine‑tuned on only 1.6 K unambiguous pairs yet achieves superior performance, demonstrating data efficiency. The clarifying agent (ProCAD‑clarifier) is fine‑tuned on the synthetic trajectories, learning to ask the minimal set of questions needed for a consistent spec.
Evaluation: Metrics include Chamfer distance between generated and ground‑truth meshes, invalidity ratio (code execution failures), average number of clarification rounds, and token usage. ProCAD outperforms Claude Sonnet 4.5 and GPT‑4o‑mini across all metrics, achieving near‑zero invalidity with substantially lower communication cost.

Implications and Future Work
ProCAD shows that proactive clarification can dramatically improve robustness of text‑to‑CAD pipelines, making them viable for real engineering workflows where specifications are rarely perfect. The MDP‑based reward formulation provides a principled way to trade off model accuracy against user burden, mitigating reward‑hacking risks. Future directions include extending the framework to multimodal inputs (sketches, voice), handling more complex assemblies with hierarchical constraints, and scaling the agents with larger foundation models or reinforcement‑learning‑from‑human‑feedback to further reduce clarification steps.

Conclusion
By introducing a clarifying stage before code synthesis, ProCAD transforms ambiguous natural‑language design requests into reliable, executable CadQuery programs. The combination of a rigorously curated dataset, synthetic ambiguity training, and a clear agentic formulation yields state‑of‑the‑art performance while keeping user interaction minimal, paving the way for more accessible and trustworthy AI‑assisted CAD design.

Clarify Before You Draw: Proactive Agents for Robust Text-to-CAD Generation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment