A Preliminary Assessment of Coding Agents for CFD Workflows

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We investigate the use of tool-using coding agents to automate end-to-end workflows in the open-source CFD package OpenFOAM. Building on general-purpose coding agent interfaces, we introduce a lightweight configuration that guides an agent toward tutorial reuse and log-driven repair to improve case setup and execution. We evaluate this approach on the FoamBench-Advanced benchmark, covering both tutorial-derivative and planar 2D obstacle-flow tasks. For tutorial-derivative cases, prompt guidance dramatically increases execution completion rates and reduces unnecessary tool calls. For obstacle-flow cases, stronger language models such as GPT-5.2 markedly improve mesh generation and overall task completion compared to earlier models. Our findings show that coding agents can correctly execute a range of CFD simulations with minimal configuration and that model capability significantly influences performance on tasks requiring geometry and mesh creation. These results suggest that coding agents have practical utility for automating portions of CFD workflows while highlighting areas that require further investigation.

💡 Research Summary

This paper investigates the use of general‑purpose coding agents, equipped with function‑calling tools, to automate end‑to‑end workflows in the open‑source computational fluid dynamics (CFD) package OpenFOAM. The authors build on existing tool‑using agents (specifically the OpenCode framework) and introduce a lightweight system prompt that steers the agent toward two key behaviors: (1) “tutorial‑first” reuse, where the agent searches the local OpenFOAM tutorial directory, selects a case that closely matches the task description, copies it, and makes only minimal edits to dictionaries (e.g., turbulence model, boundary conditions, scaling); and (2) “log‑driven repair,” where the agent parses error messages from OpenFOAM utilities, identifies the exact file and keyword responsible for a failure, and automatically edits the offending configuration before re‑running the workflow.

The experimental evaluation uses the FoamBench‑Advanced subset of the CFDLLMBench benchmark, which contains 16 expert‑authored tasks. These are divided into nine “tutorial‑derivative” cases that can be solved by small modifications to existing tutorials, and seven planar 2‑D obstacle‑flow cases that require geometry creation and mesh generation beyond simple tutorial copying.

For the tutorial‑derivative set, the authors compare two prompt configurations using the same backbone model (MiniMax‑M2.1). Under the default OpenCode prompt, only 4 of 9 runs reach the required end time (M_exec ≈ 44%). With the CFD‑specific prompt, all nine runs complete successfully (M_exec = 100%). The prompt also improves structural similarity (mean M_struct = 0.986) and file‑level similarity (mean M_file = 0.919) to the reference solutions, and 7 of 9 cases achieve NMSE < 0.1. Token usage and tool‑call counts drop noticeably; the CFD prompt leads to fewer write operations and more read operations, reflecting the strategy of copying and editing an existing tutorial rather than generating files from scratch.

The obstacle‑flow set highlights the importance of model capability. Using MiniMax‑M2.1, the agent fails to generate appropriate multi‑block meshes with blockMesh; in several cases the copied tutorial mesh is reused without adjusting the obstacle geometry, resulting in a domain that does not contain the intended object. Only one case produces a usable mesh, and it does so by falling back to snappyHexMesh rather than the intended blockMesh workflow. When the backbone model is switched to GPT‑5.2, mesh generation improves dramatically. GPT‑5.2 correctly selects between blockMesh and snappyHexMesh, respects the specified obstacle dimensions, and successfully completes all four evaluated obstacle‑flow cases. This demonstrates that the ceiling on performance for geometry‑heavy CFD tasks is largely determined by the underlying language model’s code‑generation and domain‑reasoning abilities.

Key insights from the study are:

A minimal, well‑crafted system prompt can transform a generic coding agent into a practical CFD automation tool without any additional fine‑tuning or multi‑agent orchestration.
Reusing validated tutorial cases dramatically reduces configuration errors, tool‑call overhead, and token consumption, leading to higher reliability for tasks that are close to existing examples.
Log‑driven iterative repair is effective because OpenFOAM error messages are highly specific, pointing to the exact dictionary entry that needs correction.
For tasks that require geometry creation and complex meshing, the language model’s intrinsic capability is the dominant factor; newer, larger models (e.g., GPT‑5.2) can handle these challenges, whereas smaller models struggle.

The authors conclude that coding agents have practical utility for automating substantial portions of OpenFOAM workflows, especially when guided by tutorial‑first and log‑repair strategies. However, they also note limitations: the current approach does not yet address post‑processing, visualization, or multi‑case parametric studies, and it relies on the presence of suitable tutorial cases for the “reuse” step. Future work is suggested in three directions: (a) extending the prompt to cover pre‑ and post‑processing stages, (b) integrating automatic mesh‑quality assessment and dynamic re‑meshing loops, and (c) generalizing the methodology to other CFD platforms such as SU2 or ANSYS Fluent. Overall, the paper provides a compelling demonstration that lightweight prompt engineering, combined with powerful LLMs, can bridge the gap between AI‑assisted code generation and real‑world scientific computing pipelines.

A Preliminary Assessment of Coding Agents for CFD Workflows

💡 Research Summary

Comments & Academic Discussion

Leave a Comment