Prompts Blend Requirements and Solutions: From Intent to Implementation
AI coding assistants are reshaping software development by shifting focus from writing code to formulating prompts. In chat-focused approaches such as vibe coding, prompts become the primary arbiter between human intent and executable software. While…
Authors: Shalini Chakraborty, Jan-Philipp Steghöfer
P R O M P T S B L E N D R E Q U I R E M E N T S A N D S O L U T I O N S : F R O M I N T E N T T O I M P L E M E N T A T I O N Shalini Chakraborty Univ ersity of Bayreuth Bayreuth, Germany shalini.chakraborty@uni-bayreuth.de 0000-0002-9466-3766 Jan-Philipp Steghöfer XIT ASO GmbH IT and Software Solutions Augsbur g, Germany jan-philipp.steghoefer@xitaso.com 0000-0003-1694-0972 March 18, 2026 A B S T R AC T AI coding assistants are reshaping software de velopment by shifting focus from writing code to formu- lating prompts. In chat-focused approaches such as vibe coding , prompts become the primary arbiter between human intent and ex ecutable software. While Requirements Engineering (RE) emphasizes capturing, validating, and e volving requirements, current prompting practices remain informal and ad hoc. W e argue that prompts should be understood as lightweight, ev olving requirement artifacts that blend requirements with solution guidance. W e propose a conceptual model decomposing prompts into three interrelated components: Functionality and Quality (the requirement), General Solutions (architectural strategy and technology choices) and Specific Solutions (implementation-lev el con- straints). W e assess this model using existing prompts, examining ho w these components manifest in practice. Based on this model and the initial assessment, we formulate four hypotheses: prompts ev olve tow ard specificity , ev olution varies by user characteristics, engineers using prompting engage in increased requirement validation and verification, and progressi ve prompt refinement yields higher code quality . Our vision is to empirically ev aluate these hypotheses through analysis of real-w orld AI-assisted dev elopment, with datasets, corpus analysis, and controlled experiments, ultimately deriving best practices for requirements-aw are prompt engineering. By rethinking prompts through the lens of RE, we position prompting not merely as a technical skill, but as a central concern for software engineering’ s future. 1 Introduction AI-assisted chat-based coding tools fundamentally reshape software de velopment by enabling de velopers to e xpress functionality , constraints, and behavior in natural language rather than code. In emerging paradigms like vibe coding [ 18 ] and agentic coding [ 29 , 20 ], natural language prompts become the primary artif act bridging human intent and ex ecutable software. Unlike traditional workflows that maintain clear divisions between requirements, design, and implementation, chat-based dev elopment collapses these boundaries: a single prompt encapsulates requirements, architectural strategy , and implementation constraints simultaneously . Y et prompts remain informal, ephemeral, and ad hoc—lacking the rigor traditionally associated with Requirements Engineering (RE). W e ar gue that prompts should be understood as lightweight, e volving requirement artifacts that blend requirements with solution guidance, and that systematic prompt engineering represents a ne w frontier for RE in the age of generativ e AI. This paper builds on the prompt triangle [ 6 ] as an emerging result, a conceptual model decomposing prompts into three interrelated dimensions: (1) Functionality and Quality (the requirement), (2) General Solution (architectural strategy and technology choices), and (3) Specific Solution (implementation-le vel constraints). W e validate this model using the De vGPT dataset [ 31 ], position it relati ve to existing prompt patterns and RE v alidation techniques, and formulate four testable hypotheses about prompt ev olution, user characteristics, requirements verification in iterati ve prompting, A P R E P R I N T - M A R C H 1 8 , 2 0 2 6 and the relationship between prompt refinement and code quality . Our vision is to establish empirically-informed best practices for requirements-aware prompt engineering, rejuv enating RE for AI-assisted development. 2 Chat-based Coding Assistants AI coding assistants operate in two distinct modes that shape dev eloper interaction fundamentally dif ferently . Code completion interfaces work within existing code structures, assisting with boilerplate, method bodies, and idiomatic patterns where dev elopers provide the architectural skeleton. Chat-based interfaces enable developers to describe functionality , constraints, and solution strate gies in natural language, potentially re vamping entire codebases from a single prompt. This distinction is methodologically critical [ 3 , 26 , 24 ]: code completion preserves traditional incremental programming, while chat-based prompting e xternalizes problem-solving into con versational exchanges where a single prompt conflates requirements specification, design rationale, v alidation-verification criteria, and implementation intent. Conflating the two modes risks masking ke y dif ferences in cognitiv e ef fort, de veloper mental models, and the role of requirements [ 30 ]. This paper focuses on chat-based interfaces and their implications for requirements engineering. 3 Prompts and Requir ements Prompt engineering refers to the practice of carefully crafting and refining prompts to guide large language models (LLMs) toward producing desired outputs [ 19 , 7 ]. In the context of software development, prompt engineering is increasingly seen as a crucial skill for effectiv ely lev eraging chat-based coding assistants. Howev er , while prompts may contain elements of system requirements or design intent, they are often ad hoc and lack the structure of formal requirements [ 3 ]. Many de velopers struggle to pro vide the right le vel of detail and to translate requirements into prompts consistently [ 27 ]. This raises an important question for RE: How can the rigor of RE be infused into prompting practices? V ie wing prompts as lightweight but e volving requirement artifacts of fers a way to bridge the gap between informal dev eloper intentions and structured requirements specification [ 28 ]. Such an integration will enhance the reliability of AI-generated solutions and redefine RE practices in the era of AI-assisted chat-based dev elopment. 3.1 A Conceptual Model of Prompts Recent studies ha ve begun to explore the role of prompts in softw are de velopment and requirement engineering [ 22 , 12 ]. W e argue, howe ver , that to fully understand how prompts function in AI-assisted chat-based dev elopment – particularly in emerging practices such as vibe coding (iterativ e, conv ersational dev elopment through natural language prompts [ 16 ]) or agentic coding (autonomous AI agents that reason about and ex ecute dev elopment tasks), where the prompt serves as the primary interaction medium between requirements and the resulting software – it is essential to consider their structure from an RE perspectiv e. T o this end, we propose a conceptual model that decomposes a prompt into three interrelated components: Functionality and Quality: Captures the core functionality or quality , i. e., what the de veloper wants the system to do. W e include both functional requirements as well as quality requirements in this category . General Solutions: Guides how the functionality should be realized, including preferred technologies, paradigms, or architectural patterns. Specific Solutions: Detailed, low-le vel implementation instructions or constraints that tailor the solution space further . T ogether , these elements form the Pr ompt T riangle (Figure 1), a practical framing that helps authors structure prompts and assess completeness. In practice, prompts oscillate between these components across iterations, expressing intent, refining solution directions, and constraining implementation. This perspectiv e connects ongoing RE work on ho w prompting intersects with requirements capture and ev olution [12]. Quality requirements (or non-functional requirements, NFRs) such as performance, reliability , security , accessibility , and usability should be co-specified with the Functionality so the prompt captures both what the system does and how well it must do it. W e suggest to state them as measurable acceptance criteria appended to the functionality description, while leaving approach guidance to the General and Specific Solution components. Co-locating quality requirements with the functionality stabilizes the core intent and makes v alidation and verification e xplicit within the prompt. Illustrative Example Consider a prompt such as: 2 A P R E P R I N T - M A R C H 1 8 , 2 0 2 6 Functionality and Quality The requirement that AI needs to tran slate to a solution; the qualiti es the solution needs to have. General Solution Information about how the AI is supposed to solve the problem such as technology to use , architectural patterns, etc. Specific Solution Concrete implementation level instructions or constraints relevant to the final solution Figure 1: Prompt Triangle: prompts composed of Functional- ity and Quality , General Solutions, and Specific Solutions. Build a React web app that shows the current weather fo r a given c ity , and make it fast, reliable, safe, and accessible. Use server-side rendering; fetch data from Op enWeather. St yle the page cleanly and sho w cit y , condition, humidity , wind, and Celsius temp erature with an icon. It should load quickly on a typical phone, show a loading state, retry if the netw ork fails, and displa y clear erro rs. Keep the API key on the server, avoid logging p ersonal details, and ensure p eople can use the app with a keyboard and screen readers with readable colors. This prompt mixes the dif ferent concerns of the Prompt T riangle, but its elements can be assigned to the triangles corners: • Functionality and Quality: Build a web app that sho ws the current w eather for a given cit y , loads quickly on a typical phone, shows a clea r loading state, and handles slo w or failing net wo rk connections gracefully , don’t reco rd personal details in logs; make the app usable with k eyb oa rd and screen readers; ensure text and colors a re easy to read. • General Solution: Prefer React; use server-side rendering; fetch current weather from Op enWeather • Specific Solution: Use a clean, mo dern style; show cit y name, condition, humidity , wind, and temp erature in Celsius with a small icon. , keep the API key out of the bro wser and store it safely on the server; W e believ e that this triangular model provides a lens through which we can examine prompts not just as one-off commands, but as ev olving artifacts of requirements expression, ne gotiation, and realization which capture the main intentions of the dev eloper [ 15 ]. As prompts are iterati vely refined, attention may oscillate between specifying what is needed (Functionality and Quality), how it should be approached (General Solutions), and ho w exactly it should be ex ecuted (Specific Solutions). Model V alidation T o validate this conceptual model, we ha ve in vestigated a lar ge number of examples of prompts from blog posts and from the DevGPT dataset [ 31 ]. Some examples are sho wn in T able 2. In addition, we extracted a sample of 120 prompts from the DevGPT dataset and asked Claude Sonnet 4.5 to categorize the information in them. W e used a script to randomly extract initial prompts from con versations in 20230803_095317_commit_sharings.json and 20231012_233628_pr_sharings.json . All prompts used are av ailable in the replication package [ 1 ]. W e used only the initial message in a con versation as we are, at this stage, not interested in the ev olution of the prompting during a con versation but rather only in v alidating the structure of our conceptual model. W e checked the categorization provided by the LLM manually and did not find misclassifications. Our analysis shown in T able 1 demonstrates a strong tendency to ward structured, multi-component prompts. Require- ments appeared in 98.3% of prompts, establishing them as the foundational element of prompt construction. The predominant pattern ( n = 64 , 53.3%) incorporated all three components, indicating that users frequently provide comprehensiv e specifications of the solution. General solutions (76.7%) are more commonly included than specific solutions (63.3%), suggesting that users prefer constraining the solution while preserving implementation fle xibility . Co-occurrence analysis shows that requirements with at least one solution component accounted for 85% of prompts, while solution components appeared without requirements in only two cases (1.7%) 1 . This asymmetry supports a way of thinking where specifying intent precedes the prescription of a solution. The secondary pattern of requirement-plus- general-solution prompts ( n = 27 , 22.5%) further supports a two-stage refinement process from “what” to “ho w” that may optionally extend to specific implementation details. Interestingly , 16 prompts (13.3%) only contain requirements and no solution component. These are often requests to translate some text or simple questions about code snippets. 1 One of these prompt was a clarification of the beha viour of a certain library . The second prompt in the con versation instructed the LLM to reformulate text. The other prompt was source code without additional instructions. 3 A P R E P R I N T - M A R C H 1 8 , 2 0 2 6 T able 1: Prompt Component Analysis (N=120) Component Present Pattern Req Gen Spec n % All Three ✓ ✓ ✓ 64 53.3 Req + Gen ✓ ✓ 27 22.5 Req + Spec ✓ ✓ 11 9.2 Gen + Spec ✓ ✓ 1 0.8 Req Only ✓ 16 13.3 Gen Only ✓ 0 0% Spec Only ✓ 1 0.8 T otal 118 92 77 120 100 (%) (98.3) (76.7) (63.3) 3.2 Prompting as a T ool f or Requirement V alidation and V erification Requirement engineering traditionally distinguishes between validation (ensuring we are building the right thing) and verification (ensuring we are b uilding it right). In the context of AI-assisted chat-based de velopment, structured prompts offer a novel mechanism to support both activities. By explicitly articulating Functionality and Quality , General Solutions , and Specific Solutions , these artifacts act as ev olving artif acts of stakeholder intent and technical interpretation. Requirement validation occurs as the functionality component is refined in light of feedback and gradually reconciled with specific solutions, ultimately aligning the prompt with stakeholder needs. Meanwhile, requirement verification is observ able in the iterativ e interactions and adjustments between the three prompt components, which mirror the ongoing process of checking that the system is being b uilt correctly . Thus, prompts not only guide the AI in code generation, they serv e as dynamic roadmaps for both validating what should be b uilt and verifying ho w it is implemented. This bears similarities to classical requirement v alidation and verification techniques. Atoum et al. ’ s recent systematic revie w [ 2 ] categorizes these into six approaches. Our method resembles prototyping, inspection, and testing-oriented techniques. In terms of validation by inspection , the Functionality and Quality part of the Prompt T riangle serves a similar role to traditional requirements documents subjected to inspection. Howe ver , there are two distinctions: First of all, rather than discrete re view meetings, our approach enables continuous, AI-mediated inspection through the iterati ve refinement between Functionality and Specific Solutions. The AI acts as an automated consistency analyzer that surfaces conflicts when it cannot reconcile high-lev el intent with implementation details, though the ef fectiveness of this detection depends on the AI’ s interpretive capabilities and how explicit the formulation in the prompt is. Second of all, we expect the Functionality and Quality part of a prompt to be much more limited than a requirements document. An important part of inspections are the relationships between requirements [ 17 ]. W e do not expect such relationships to become visible in individual prompts, b ut potentially as part of longer conv ersations. Pr ototyping as a requirement v alidation technique was introduced by Boehm in 1984 [ 4 ]. In prototyping, working models of the software are created to v alidate that requirements meet stak eholder needs. One of the strengths of the Prompt T riangle is that it separates the solution components from the Functionality and Quality parts, thus enabling rapid iteration between Functionality and multiple solution formulations. Unlike traditional prototyping, which requires substantial dev elopment effort to e xplore alternative designs, the Prompt T riangle allo ws dev elopers to experiment with different approaches in the General Solutions and Specific Solutions components while holding Functionality constant, generating ex ecutable prototypes within seconds rather than days [ 5 ]. When the AI fails to generate satisf actory code when a Specific Solution is provided, this failure signals either requirement ambiguity or an incorrect solution approach. Notably , this mechanism also works in rev erse: when the AI successfully generates code that nonetheless fails to meet stakeholder needs upon re view , this reveals gaps in the Functionality specification rather than implementation errors—a form of validation through e xecutable prototyping. In verification via testing-oriented appr oaches , prompts inherently produce testable artif acts. By maintaining explicit traceability between Functionality (what to test) and Specific Solutions (ho w it is implemented) within the same artifact, the Prompt T riangle enables test generation that simultaneously v alidates stakeholder intent and v erifies implementation correctness. Howe ver , the quality of generated tests depends critically on whether the AI grasps the essential test scenarios implied by the Functionality component and the edge cases introduced by the Specific Solutions. This dual focus is difficult to achiev e when requirements and code exist as separate artifacts. In many cases, de velopers instruct AIs o generate test cases for each ne w functionality automatically . Research on automated test case generation with 4 A P R E P R I N T - M A R C H 1 8 , 2 0 2 6 T able 2: Example prompts with classification of the information contained in them to the components of the prompt triangle. Prompt Functionality and Quality General Solu- tion Specific Solution Write a Python implementation of merge sort opti- mized for memory ef ficiency with time complexity analysis and error handling for edge cases including empty arrays. 1 Sort input v alues; Provide time complexity analysis; Handle error/edge cases; Optimized for memory efficienc y Use merge sort algorithm; Implement in Python I want this game to rely on local storage to remem- ber who I am and who my picks were in previous contests. A contest is January , March, May , July , September , or November of a given year . The cur- rent contest is July 2023. W e will assume I am in admin mode and I can switch users to record ev- eryone’ s picks (which are visible to ev eryone) and backfill old results. Please add at least one new test. 3 Remember who I am and who my picks were in pre vious con- tests; Switch users as admin; Record picks; Backfill old re- sults; Display picks to ev eryone Persist data using local storage Generate a Python function using pandas to: Read a CSV file. Drop null values. Conv ert a date column to datetime format. Group data by category and calculate av erage values. 5 Read CSV file; Drop null v alues; Con vert date column; Group data by category; Calculate av erage values Use pandas library; Imple- ment in Python Read with pan- das.read_csv Write a FastAPI endpoint that: Accepts a POST request with JSON data ({name: str, age: int}). V ali- dates the input. Returns a success message with the receiv ed data. 5 V alidate input; Return success message Use FastAPI framew ork; Implement in Python Define a POST endpoint; Accept JSON with {name: str , age: int} Create a Python script using BeautifulSoup to: Fetch HTML from ’example.com/ne ws’. Extract all head- lines (h2 tags). Export results to a JSON file. 5 Fetch HTML; Extract data from HTML; Export results to JSON Use Beautiful- Soup library; Implement in Python Extract all
tags Generate Python code to: Load the Iris dataset. Split into train/test sets. Train a RandomF orestClassifier . Print accuracy metrics and a confusion matrix. 5 T rain model; Split into train/test sets; Print accuracy metrics; Print confusion matrix Use Random- ForestClassifier; Implement in Python Load Iris dataset Write a Python script using pathlib to: List all files in ’./downloads’. Create folders for each file extension. Mov e files to their respective folders. Skip files without extensions. 5 List files in directory; Create folders for file e xtensions; Move files by extension; Skip files without extensions Implement in Python Use pathlib to enumerate files in ’./downloads’ Create a T ypeScript function that validates email addresses with the following requirements: Must be RFC 5322 compliant Rejects disposable email domains Returns detailed error messages. Example of expected function signature and usage: function validateEmail(email: string): { isV alid: boolean; message: string } 1 V alidate input; RFC 5322 com- pliance; Reject disposable email domains; Return detailed error messages Use T ypeScript validateEmail(email: string) 1 https://margabagus.com/prompt- engineering- code- generation- practices/ 2 https://github.com/p3ob7o/Speak/commit/01cec3e3d17e26f703ce8bf7aa068d3f6b6364d3 (via DevGPT) 3 https://github.com/hoshotakamoto/banzukesurfing/commit/63b2ab90b0b138e509e87efad59fd72b414d0133 (via DevGPT) 4 https://mitchellh.com/writing/non- trivial- vibing 5 https://zencoder.ai/blog/vibe- coding- prompts 5 A P R E P R I N T - M A R C H 1 8 , 2 0 2 6 LLMs is just emerging (see, e.g., [ 23 ]), but combining requirement and solution approaches in a prompt can help guide the generation tow ards more targeted tests of the technical solution. 3.3 Prompt P atterns and and Prompt Iterations Recent studies analyze prompt patterns, but none treat prompts as blends of requirements and solutions. DiCuff a et al. [ 9 ] identify se ven patterns, such as persona (“you are”, “pretend to be”) or r ecipe (“step-by-step”, “guide”), focusing on structural con ventions rather than semantic content. Siddiq et al. [ 25 ] inv estigate quality issues with de veloper prompts, finding they often suffer from ambiguity and insuf ficient context for producing high-quality output. While these anti-patterns are complementary to our findings, the authors do not propose a conte xtual model distinguishing requirement from solution elements. Della Porta et al. [ 8 ] analyze ho w prompt patterns affect code quality across three dimensions: meta-prompting (zero-shot and few-shot), chain-of-thought, and personas. They find no statistically significant dif ferences between combinations, but lik e others, do not examine prompt content systematically . Huang et al. [ 12 ] specifically address prompt engineering for requirements engineering, e xploring techniques including multimodal prompting (e.g., UI mockups) and self-reflection prompts where LLMs pro vide confidence v alues. They propose using generated code to validate requirements, yet do not distinguish between requirement and solution elements within prompts themselves. Beyond individual prompts, Rubino [ 21 ] proposes “con versation routines” to structure extended human-LLM dialogues through system prompts, applicable to software de velopment conte xts. Our prompt triangle provides the first systematic model decomposing prompts into requirement (Functionality and Quality) and solution (General, Specific) dimensions, enabling analysis of how these elements e volve through iterati ve refinement. 4 Hypotheses T o understand how dev elopers externalize, ev olve, and validate requirements through prompts when working with chat-based AI coding assistants, our approach is grounded in the observ ation that a prompt can be decomposed into three distinct but interrelated parts (see Fig 1). Based on this framew ork, we formulate the following hypotheses: H1 (Prompt Ev olution) During multi-turn AI-assisted coding sessions, prompts exhibit a statistically significant increase in the percentage of Specific Solutions content compared to General Solutions and Functionality and Quality , as assessed by textual decomposition. H2 (User -driven Pr ompt Strategy) The distribution of Prompt T riangle components is significantly associated with de veloper experience and domain f amiliarity , with more experienced users specifying more Specific Solutions . H3 (Requirement V alidation and V erification) During iterativ e prompt refinement sessions, de velopers perform activities that correspond to classical v alidation and verification techniques (requirements re vie w , prototyping, test case generation) at rates significantly higher than in traditional de velopment sessions without AI assistance, as measured by activity coding of session transcripts. H4 (V alidation Through Pr ogressiv e Refinement) Con versations that separate validation (establishing stable Func- tionality and Quality early) from verification (iterati vely refining General Solution then Specific Solution ) produce higher -quality code with significantly fewer requirement-implementation misalignments, better static analysis scores, and reduced need for post-generation corrections compared to con versations where all components ev olve simultaneously . 5 Discussion The power of chat-based interf aces is most ob vious in vibe coding. Meske et al. [ 16 ] define vibe coding as the culmination of a historical process of intent mediation”: the level of abstraction required to implement solutions has ev olved from hardware manipulation through conceptual modeling to prompting, shifting the mediation of de veloper intent from deterministic instruction to probabilistic inference. ” T o guide this inference, “goal-oriented intent expression” via prompts is necessary . Crucially , Meske et al. consider actual implementation details irrelev ant. Howe ver , in the vibe coding prompts we hav e studied, solution details as captured in our Prompt T riangle are almost always present. This discrepancy suggests either that current practice remains anchored to solution-lev el control due to dev eloper trust issues or LLM limitations, or that Meske et al. ’ s vision represents an idealized future state not yet achiev able. Our Prompt T riangle embraces this reality by explicitly accommodating solution guidance alongside functional intent, rather than treating solution details as impurities to eliminate. As LLMs mature, we hypothesize the relati ve weight of Specific Solutions may decrease, b ut the tripartite structure will remain rele vant as de velopers will always need some architectural control. Our work on H1 and H2 will identify how de velopers ev olve prompts during their work, while H3 and H4 will rev eal how requirements and solutions co-e volve and whether this strate gy benefits code quality . 6 A P R E P R I N T - M A R C H 1 8 , 2 0 2 6 T aking this further , OpenAI’ s Sean Grov e proposes that prompts become ex ecutable specifications, arguing future programmers will write elaborate specifications that generati ve AIs turn into complex programs 2 . Counter -arguments exist: executable specifications hav e long existed [ 11 , 10 ], and making specifications fully executable constitutes programming itself. Moreover , unlike formal programming languages, natural language is ambiguous. Jackson considers this ambiguity a strength rather than weakness for natural language reasoning in AI[ 13 ]. LLMs arguably realize his vision by connecting natural language to vector space concepts. Howe ver , Jackson envisioned reasoning AIs, not code generators: the ability to “understand the world” via natural language does not directly translate to reproducibly creating programs fulfilling user needs. Much requirements engineering work addresses disambiguating unclear requirements[14]. Our work addresses this conflict directly: we study how requirements are captured in prompts with all their ambiguity and ho w engineers constrain generation by providing solution approaches. W e belie ve prompt requirements must ev olve through disambiguation and scaf folding. H1 and H3 will illuminate this process, while H4 will re veal the impact on code quality when validation and disambiguation happen early . The tool landscape validates our premise. Kiro 3 , currently in previe w , promises to turn “your prompt into clear requirements, system design, and discrete tasks. ” Here, prompts come first and requirements after—v alidating that requirements are inherent in prompts alongside solution aspects. Kiro’ s value proposition is extracting and disambiguating these elements. 6 Plans f or Empirical V alidation T o empirically test our hypotheses, we en vision a multi-phase research agenda combining corpus construction, corpus analysis, and controlled experiments. Our future work will proceed along the following lines: Corpus Construction. As a foundational step, we will construct a dataset of real-world con versations from vibe coding sessions and AI coding assistant interactions, annotated according to our conceptual model ( Functionality , General Solutions , Specific Solutions ) and shared publicly . It will also include data about dev eloper experience and domain familiarity gathered via a questionnaire. Our dataset will extend DevGPT [ 31 ] with this information about the developers as well as with modern prompting styles and contextual metadata including interaction mode (IDE chat interface, standalone chatbot) and dev elopment approach (vibe coding, structured development). Corpus Analysis (H1 and H2). T o inv estigate ho w prompts e volv e during a session (H1) and ho w developer -specific factors shape prompting strate gies (H2), we plan to analyse multi-turn sessions from the corpus. W e will analyze the sequence of prompts, code artifacts, and re visions to model the dynamics of prompt e volution and to correlate them with human factors such as experience le vel and familiarity with the domain. Contr olled Experiments (H3 and H4). T o inv estigate requirements v alidation and verification (H3), we will design tasks where dev elopers iterativ ely refine their prompts in response to AI-generated code, and we will measure the stability of the Functionality and Quality aspect ov er time. This will allow us to operationalize requirement stability as an indicator of verification and validity . T o ev aluate the relationship between progressive refinement and code quality (H4), we will conduct controlled experiments where dev elopers are assigned tasks with v arying degrees of guidance on ho w to structure and ev olve their prompts. Generated code will be assessed along dimensions of correctness, maintainability , and alignment with requirements. Guidelines, Best Practices, and Developer Support. The insights gained from these empirical studies will inform the development of best practices for requirements-aware prompt engineering. The guidelines will bridge the gap between informal, ad hoc prompting and the structured rigor of Requirements Engineering, ultimately contributing to a more reliable and systematic use of AI coding assistants. This program of work tests our hypotheses and positions prompting as a central artifact in the future of Requirements Engineering. By combining observational, e xperimental, and design-oriented methods, we aim to advance both theoretical understanding and practical guidance in this emerging field. The Prompt Triangle frame work transforms IDEs from passiv e code editors into active requirements v alidation partners. Current IDEs track changes but cannot critique prompt quality , e.g., “your prompt specifies implementation details without stating what problem you’ re solving” or “your quality requirements contradict your chosen solution approach. ” By structuring prompts into Functionality and Quality , General Solutions, and Specific Solutions, IDEs gain the analytical foundation to provide actionable feedback: highlighting when prompts are solution-heavy but requirement- light (risk: b uilding the wrong thing efficiently), detecting inconsistencies between components (e.g., “you require 2 https://www.youtube.com/watch?v=8rABwKRsec4 3 https://kiro.dev/ 7 A P R E P R I N T - M A R C H 1 8 , 2 0 2 6 high performance but specified a nai ve algorithm”), and visualizing how requirements stabilize across con versation turns. Just as architects validate designs by examining requirement-solution coherence, our frame work enables IDEs to computationally perform this validation during prompt authoring, catching requirement issues before code generation. 7 Conclusion Chat-based de velopment shifts the de veloper’ s role from writing code to crafting prompts, positioning prompts as the central artifact that blends requirements with solution strategies. W e propose structuring prompts into Functionality , General Solutions , and Specific Solutions , and formulate hypotheses on their evolution, user -driv en strategies, code quality , and requirement stability . Our future work will test these h ypotheses, ultimately deriving best practices and IDE support for requirements-aw are prompt engineering. W e urge the RE community to treat prompting as a central concern for the future of software engineering. Acknowledgments Perplexity .ai and Claude Sonnet 4.5 were used to critique drafts of the manuscript, refine h ypotheses, and elicit feedback on dif ferent aspects of the conceptual model. Claude Sonnet 4.5 was used to classify information in prompts into the three areas of the triangle and generate the corresponding tables. In addition, it created some of the text passages in this work. References [1] Anonymous. Dataset for “Prompts Blend Requirements and Solutions: From Intent to Implementation” , February 2026. https://doi.org/10.5281/zenodo.18713272. [2] Issa Atoum, Mahmoud Khalid Baklizi, Izzat Alsmadi, Ahmed Ali Otoom, T aha Alhersh, Jafar Ababneh, Jameel Almalki, and Saeed Masoud Alshahrani. Challenges of software requirements quality assurance and validation: A systematic literature revie w . IEEE Access , 9:137613–137634, 2021. [3] Shraddha Barke, Michael B James, and Nadia Polikarpo va. Grounded copilot: Ho w programmers interact with code-generating models. OOPSLA , 7:85–111, 2023. [4] B.W . Boehm. V erifying and validating softw are requirements and design specifications. IEEE Softwar e , 1(1):75– 88, 1984. [5] Markus Borg, Elizabeth Bjarnason, and Fabian Hedin. V ibe coding and the new prototyping playbook. IEEE Softwar e , 42(6):12–16, 2025. [6] Shalini Chakraborty and Jan-Philipp Steghöfer . Exploring prompts as mixed requirements and solutions artifacts. In ICSE Companion , 2026. [7] Banghao Chen, Zhaofeng Zhang, Nicolas Langrené, and Shengxin Zhu. Unleashing the potential of prompt engineering in large language models: a comprehensiv e revie w . arXiv pr eprint arXiv:2310.14735 , 2023. [8] Antonio Della Porta, Stefano Lambiase, and Fabio Palomba. Do prompt patterns affect code quality? a first empirical assessment of chatgpt-generated code. In EASE , pages 181–192, 2025. [9] Sophia DiCuf fa, Amanda Zambrana, Priyanshi Y ada v , Sashidhar Madiraju, Khushi Suman, and Eman Abdullah AlOmar . Exploring prompt patterns in ai-assisted code generation: T ow ards faster and more effecti ve dev eloper-ai collaboration. In ICMI , pages 1–7. IEEE, 2025. [10] Norbert E Fuchs. Specifications are (preferably) executable. Software Engineering J ournal , 7(5):323–334, 1992. [11] Ian James Hayes and Clif f B Jones. Specifications are not (necessarily) ex ecutable. Software Engineering Journal , 4(6):330–339, 1989. [12] Kaicheng Huang, F anyu W ang, Y utan Huang, and Chetan Arora. Prompt engineering for requirements engineering: A literature revie w and roadmap. In RE W orkshops , pages 548–557. IEEE, 2025. [13] Philip Jackson. Understanding understanding and ambiguity in natural language. Pr ocedia Computer Science , 169:209–225, 2020. [14] Ibrahim Khalil, Israr Ahmad, Uzair Rasheed, W asi Haider Butt, and Zaeem Anwaar . Detecting cross-domain ambiguity in requirements through natural language processing, a systematic literature re vie w . In ICA CS . IEEE, 2025. 8 A P R E P R I N T - M A R C H 1 8 , 2 0 2 6 [15] Jacob Krüger , Y i Li, Chenguang Zhu, Marsha Chechik, Thorsten Berger , and Julia Rubin. A vision on intentions in software engineering. In FSE/ESEM , ESEC/FSE 2023, page 2117–2121, New Y ork, NY , USA, 2023. Association for Computing Machinery . [16] Christian Meske, T obias Hermanns, Esther V on der W eiden, Kai-Uwe Loser , and Thorsten Berger . V ibe coding as a reconfiguration of intent mediation in software de velopment: Definition, implications, and research agenda. IEEE Access , 13:213242–213259, 2025. [17] B. Nuseibeh, J. Kramer , and A. Finkelstein. A frame work for expressing the relationships between multiple views in requirements specification. TSE , 20(10):760–773, 1994. [18] Partha Pratim Ray . A review on vibe coding: Fundamentals, state-of-the-art, challenges and future directions. Author ea Pr eprints , 2025. [19] Laria Reynolds and Kyle McDonell. Prompt programming for large language models: Beyond the fe w-shot paradigm. In CHI , pages 1–7, 2021. [20] Maxime Robeyns, Martin Szummer , and Laurence Aitchison. A self-improving coding agent. arXiv preprint arXiv:2504.15228 , 2025. [21] Giorgio Robino. Con versation routines: A prompt engineering framework for task-oriented dialog systems. arXiv pr eprint arXiv:2501.11613 , 2025. [22] Krishna Ronanki, Simon Arvidsson, and Johan Ax ell. Prompt engineering guidelines for using large language models in requirements engineering. arXiv preprint , 2025. [23] Max Schäfer , Sarah Nadi, Aryaz Eghbali, and Frank T ip. An empirical e valuation of using large language models for automated unit test generation. TSE , 50(1):85–105, 2023. [24] Agnia Serge yuk, Y aroslav Golube v , Timofe y Bryksin, and Iftekhar Ahmed. Using ai-based coding assistants in practice: State of af fairs, perceptions, and ways forward. IST , 178:107610, 2025. [25] Mohammed Latif Siddiq, Simantika Dristi, Joy Saha, and Joanna CS Santos. The fault in our stars: Quality assessment of code generation benchmarks. In SCAM , pages 201–212. IEEE, 2024. [26] Priyan V aithilingam, T ianyi Zhang, and Elena L Glassman. Expectation vs. experience: Evaluating the usability of code generation tools powered by lar ge language models. In CHI , pages 1–7, 2022. [27] Hugo V illamizar , Jannik Fischbach, Ale xander K orn, Andreas V ogelsang, and Daniel Méndez. Prompts as software engineering artifacts: A research agenda and preliminary findings. In PROFES , v olume 16361 of LNCS , pages 470–478. Springer , 2025. [28] Andreas V ogelsang. Prompting the future: Integrating generati ve llms and requirements engineering. In REFSQ W orkshops , 2024. [29] Huanting W ang, Jingzhi Gong, Hua wei Zhang, and Zheng W ang. Ai agentic programming: A surv ey of techniques, challenges, and opportunities. arXiv preprint , 2025. [30] Jason W ei, Xuezhi W ang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in lar ge language models. Advances in neural information pr ocessing systems , 35:24824–24837, 2022. [31] T ao Xiao, Christoph T reude, Hideaki Hata, and Kenichi Matsumoto. De vgpt: Studying dev eloper-chatgpt con versations. In MSR , pages 227–230, 2024. 9
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment