RAG4Tickets: AI-Powered Ticket Resolution via Retrieval-Augmented Generation on JIRA and GitHub Data

RAG4Tickets: AI-Powered Ticket Resolution via Retrieval-Augmented Generation on JIRA and GitHub Data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Modern software teams frequently encounter delays in resolving recurring or related issues due to fragmented knowledge scattered across JIRA tickets, developer discussions, and GitHub pull requests (PRs). To address this challenge, we propose a Retrieval-Augmented Generation (RAG) framework that integrates Sentence-Transformers for semantic embeddings with FAISS-based vector search to deliver context-aware ticket resolution recommendations. The approach embeds historical JIRA tickets, user comments, and linked PR metadata to retrieve semantically similar past cases, which are then synthesized by a Large Language Model (LLM) into grounded and explainable resolution suggestions. The framework contributes a unified pipeline linking JIRA and GitHub data, an embedding and FAISS indexing strategy for heterogeneous software artifacts, and a resolution generation module guided by retrieved evidence. Experimental evaluation using precision, recall, resolution time reduction, and developer acceptance metrics shows that the proposed system significantly improves resolution accuracy, fix quality, and knowledge reuse in modern DevOps environments.


💡 Research Summary

**
The paper presents RAG4Tickets, a Retrieval‑Augmented Generation (RAG) framework designed to accelerate and improve the quality of software ticket resolution by jointly leveraging JIRA issue data and GitHub pull‑request information. The authors identify a common pain point in modern DevOps environments: knowledge about past bugs, work‑arounds, and code changes is scattered across multiple tools, causing developers to waste time searching for relevant precedents. To address this, the system builds a unified pipeline that (1) ingests tickets, comments, and PR metadata via the JIRA and GitHub APIs, (2) normalizes and cleans the raw text (removing boilerplate, preserving stack traces, summarizing diffs with AST‑based parsers), (3) encodes each artifact into dense vectors using lightweight Sentence‑Transformers (all‑MiniLM‑L6‑v2, multi‑qa‑MPNet‑base‑dot‑v1) for natural language and CodeBERT/GraphCodeBERT for code snippets, (4) indexes all embeddings with FAISS, selecting the HNSW index for its balance of recall and latency, and (5) feeds the top‑k retrieved items into a Large Language Model (LLM) such as GPT‑4, Claude, or a fine‑tuned LLaMA‑3 via a carefully crafted prompt that includes ticket metadata, relevant PR summaries, and historical resolution steps.

The generation stage produces grounded, step‑by‑step remediation plans, hyperlinks to the original PRs/commits, confidence scores, and natural‑language rationales that explain why the suggested fix is appropriate. These outputs are delivered directly into developers’ workflows: as comments on the JIRA ticket, as inline suggestions in a GitHub Action, or as automated RCA (Root‑Cause Analysis) hints in CI/CD pipelines. The modular architecture allows each component—data ingestion, embedding, retrieval, and generation—to be upgraded independently, facilitating scalability and maintainability.

A thorough evaluation is conducted on a corporate dataset comprising over 200 k tickets, comments, and PRs. Retrieval performance is measured with Recall@k, Mean Reciprocal Rank (MRR), and precision, while downstream impact is assessed via Mean Time To Resolution (MTTR) reduction and developer acceptance rate. The HNSW‑based FAISS index achieves Recall@10 = 0.84 and MRR = 0.71. When coupled with the LLM, the system reduces average resolution time by 42 % and attains a 68 % acceptance rate among engineers. A detailed case study on a React 19 micro‑service migration demonstrates that the system can surface three historically similar incidents and their associated PRs, enabling a developer to resolve a “UI crash on feature‑flag toggle” issue without manual digging.

Design trade‑offs are explicitly discussed. The authors choose on‑premise FAISS over managed vector databases (e.g., Pinecone, Weaviate) to satisfy enterprise security and cost constraints while exploiting GPU acceleration. Lightweight sentence‑transformer models are preferred over large fine‑tuned models to keep inference latency low; code‑aware embeddings are added only where they bring measurable benefit. By grounding LLM outputs in retrieved evidence, the RAG architecture mitigates hallucination, improves factual consistency, and provides traceability—critical for enterprise adoption.

In conclusion, RAG4Tickets demonstrates that a tightly coupled retrieval‑generation pipeline can turn fragmented development artifacts into actionable knowledge, delivering measurable productivity gains and higher solution quality. Future work will explore multimodal extensions (including test logs and schema files), online learning from developer feedback, and cost‑effective fine‑tuning strategies for the LLM component.


Comments & Academic Discussion

Loading comments...

Leave a Comment