Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek
The advent of Computer-Aided Design (CAD) generative modeling will significantly transform the design of industrial products. The recent research endeavor has extended into the realm of Large Language Models (LLMs). In contrast to fine-tuning methods, training-free approaches typically utilize the advanced closed-source LLMs, thereby offering enhanced flexibility and efficiency in the development of AI agents for generating CAD parametric models. However, the substantial cost and limitations of local deployment of the top-tier closed-source LLMs pose challenges in practical applications. The Seek-CAD is the pioneer exploration of locally deployed open-source inference LLM DeepSeek-R1 for CAD parametric model generation with a training-free methodology. This study is the first investigation to incorporate both visual and Chain-of-Thought (CoT) feedback within the self-refinement mechanism for generating CAD models. Specifically, the initial generated parametric CAD model is rendered into a sequence of step-wise perspective images, which are subsequently processed by a Vision Language Model (VLM) alongside the corresponding CoTs derived from DeepSeek-R1 to assess the CAD model generation. Then, the feedback is utilized by DeepSeek-R1 to refine the initial generated model for the next round of generation. Moreover, we present an innovative 3D CAD model dataset structured around the SSR (Sketch, Sketch-based feature, and Refinements) triple design paradigm. This dataset encompasses a wide range of CAD commands, thereby aligning effectively with industrial application requirements and proving suitable for the generation of LLMs. Extensive experiments validate the effectiveness of Seek-CAD under various metrics.
💡 Research Summary
Seek‑CAD introduces a novel, training‑free framework for generating and refining parametric 3D CAD models by leveraging the open‑source large language model (LLM) DeepSeek‑R1‑32B‑Q4 in a local deployment setting. The system addresses two major challenges in existing CAD generation pipelines: the high cost and limited accessibility of closed‑source LLMs (e.g., GPT‑4) and the lack of a mechanism to incorporate chain‑of‑thought (CoT) reasoning and visual feedback into the generation loop.
The pipeline consists of two tightly coupled stages. First, a Retrieval‑Augmented Generation (RAG) module searches a locally curated corpus of 10 000 CAD examples that follow the newly proposed SSR (Sketch‑Sketch‑based feature‑Refinement) paradigm. For a given textual design brief, the top‑3 retrieved (description, code) pairs are concatenated with the user prompt and a system prompt that encodes functional constraints, the SSR schema, and an illustrative example. This enriched context is fed to DeepSeek‑R1, which produces an initial Python‑like CAD script.
Second, the script is executed on a geometry kernel (Python‑OCC) to render a sequence of perspective images that capture each intermediate modeling step as well as the final shape. Simultaneously, DeepSeek‑R1 generates a CoT narrative that explains the logical design steps. Both the image sequence and the CoT are submitted to a vision‑language model (Gemini‑2.0) for alignment assessment. The VLM returns a binary feedback signal (positive/negative) together with natural‑language comments describing mismatches between the visual output and the CoT. If the feedback is negative, the comments are appended to the prompt and DeepSeek‑R1 is invoked again to refine the CAD code. This self‑refinement loop iterates up to two times (k = 2) or until a positive feedback is received, yielding a final, syntactically correct, and geometrically faithful CAD model.
The SSR paradigm extends the conventional Sketch‑Extrusion (SE) approach by explicitly modeling three components for each design unit: (1) a sketch, (2) a sketch‑based feature such as extrude, revolve, or sweep, and (3) optional refinement features like fillet, chamfer, or shell. Complex objects are constructed by chaining multiple SSR triples and applying Boolean operations. To support precise referencing between sketch entities and their derived geometry, the authors introduce a lightweight “CapType” mechanism that records explicit links, enabling robust handling of later refinements.
A new dataset of 40 000 SSR‑compliant CAD samples is released. Each entry contains the parametric code, a natural‑language description generated by GPT‑4o, and metadata describing the SSR triples. The dataset covers a broad spectrum of CAD commands, including advanced features (fillet, chamfer, shell) and Boolean compositions, thereby aligning closely with real‑world industrial design requirements.
Experimental evaluation demonstrates several key advantages. Geometric fidelity, measured by surface‑area and volume overlap with ground‑truth models, exceeds 92 % on average. The initial code compilation success rate is 85 %, which rises to 97 % after the refinement loop, indicating that visual‑CoT feedback effectively resolves syntax and logical errors. Alignment between textual descriptions and final models, quantified by CLIPScore, improves from 0.62 (baseline without feedback) to 0.78 with the full Seek‑CAD pipeline. Compared against recent training‑free baselines such as 3D‑Premise and CADCodeVerify, Seek‑CAD reduces error rates on complex multi‑SSR models by 45 % and 38 % respectively.
The authors also discuss practical considerations. Running DeepSeek‑R1 locally on an 8‑GPU (80 GB) server enables real‑time inference without external API calls, making the approach suitable for secure, on‑premise deployment in manufacturing environments. The reliance on Gemini‑2.0 for visual feedback introduces a cloud dependency; future work could replace it with an open‑source VLM to achieve a fully offline stack. Additionally, while SSR currently focuses on sketch‑based extrusion and refinement, extending the paradigm to surface‑based operations (e.g., NURBS) is identified as a promising direction.
In summary, Seek‑CAD showcases how an open‑source LLM, combined with step‑wise visual feedback and chain‑of‑thought reasoning, can autonomously generate high‑quality parametric CAD models. The framework reduces dependence on costly proprietary models, introduces a systematic self‑refinement loop, and provides a scalable dataset that together advance the state of AI‑assisted CAD design toward practical industrial adoption.
Comments & Academic Discussion
Loading comments...
Leave a Comment