Exploring LLMs for User Story Extraction from Mockups

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

User stories are one of the most widely used artifacts in the software industry to define functional requirements. In parallel, the use of high-fidelity mockups facilitates end-user participation in defining their needs. In this work, we explore how combining these techniques with large language models (LLMs) enables agile and automated generation of user stories from mockups. To this end, we present a case study that analyzes the ability of LLMs to extract user stories from high-fidelity mockups, both with and without the inclusion of a glossary of the Language Extended Lexicon (LEL) in the prompts. Our results demonstrate that incorporating the LEL significantly enhances the accuracy and suitability of the generated user stories. This approach represents a step forward in the integration of AI into requirements engineering, with the potential to improve communication between users and developers.

💡 Research Summary

The paper investigates the use of large language models (LLMs) to automatically generate user stories from high‑fidelity mockup images, a task that traditionally requires manual translation of visual UI designs into textual requirements. The authors propose a multimodal workflow that combines three key artifacts: (1) a browser‑based mockup creation tool that lets end‑users add or modify UI elements using a distinctive hand‑drawn style, (2) a domain‑specific glossary called the Language Extended Lexicon (LEL) that formally defines actors, actions, objects, states, and relationships for the application domain, and (3) an LLM (GPT‑4) accessed via API that processes both the mockup image and the textual prompt (with or without the LEL) to produce a structured user story.

The paper first reviews related work on mockup automation (e.g., Mockplug, SketchingInterfaces), on LLMs in software engineering (requirements generation, ambiguity resolution, traceability), and on the role of LEL as a structured metamodel for reducing terminology ambiguity. It then details the proposed collaborative workflow, enumerating the roles of end‑users, requirements engineers, product owners, and developers, and describing how the generated user story can be directly injected into a Kanban board or other requirement‑management tools.

Two case studies are presented. The first uses a simple YouTube mockup where a new “stats” button is added. With a basic prompt that includes only the original and modified screenshots, the LLM correctly identifies the new element, its placement, and its purpose, producing a concise user story. The second case involves LeafLab, a specialized web application for botanists. Here the domain‑specific term “points” carries a nuanced meaning related to field survey sampling. When the LEL glossary (defining “points” as a measurement of survey frequency) is appended to the prompt, the LLM accurately interprets the term and generates a user story that captures the requirement to sort species by field relevance rather than by internal ID. Without the LEL, the model misinterprets “points” as a generic score, leading to an inaccurate story.

The evaluation, based on expert review, shows that incorporating the LEL dramatically improves both the correctness and domain relevance of the generated stories, especially in specialized domains. The authors argue that this approach reduces the manual effort required to keep visual mockups and textual requirements synchronized, and that the automated pipeline can be scaled to integrate with existing agile tools.

Limitations are acknowledged: constructing the LEL requires domain expertise and incurs upfront cost; the current LLM’s image‑understanding capabilities may falter on more complex or low‑quality mockups; prompt engineering still relies on practitioner skill; and the empirical validation is limited to two examples. Future work includes automated LEL generation, testing with larger and more diverse datasets, leveraging newer multimodal LLMs with stronger vision capabilities, and creating a continuous feedback loop that updates the LEL and user stories as the UI evolves.

Overall, the study demonstrates that LLMs, when augmented with a structured domain lexicon, can bridge the gap between visual UI design artifacts and textual user stories, offering a promising direction for AI‑enhanced requirements engineering.

Exploring LLMs for User Story Extraction from Mockups

💡 Research Summary

Comments & Academic Discussion

Leave a Comment