Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives

Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Autonomous scientific discovery with large language model (LLM)-based agents has recently made substantial progress, demonstrating the ability to automate end-to-end research workflows. However, existing systems largely rely on runtime-centric execution paradigms, repeatedly reading, summarizing, and reasoning over large volumes of scientific literature online. This on-the-spot computation strategy incurs high computational cost, suffers from context window limitations, and often leads to brittle reasoning and hallucination. We propose Idea2Story, a pre-computation-driven framework for autonomous scientific discovery that shifts literature understanding from online reasoning to offline knowledge construction. Idea2Story continuously collects peer-reviewed papers together with their review feedback, extracts core methodological units, composes reusable research patterns, and organizes them into a structured methodological knowledge graph. At runtime, underspecified user research intents are aligned to established research paradigms, enabling efficient retrieval and reuse of high-quality research patterns instead of open-ended generation and trial-and-error. By grounding research planning and execution in a pre-built knowledge graph, Idea2Story alleviates the context window bottleneck of LLMs and substantially reduces repeated runtime reasoning over literature. We conduct qualitative analyses and preliminary empirical studies demonstrating that Idea2Story can generate coherent, methodologically grounded, and novel research patterns, and can produce several high-quality research demonstrations in an end-to-end setting. These results suggest that offline knowledge construction provides a practical and scalable foundation for reliable autonomous scientific discovery.


💡 Research Summary

Idea2Story introduces a two‑stage framework that separates offline knowledge construction from online research generation to improve the efficiency and reliability of LLM‑driven autonomous scientific discovery. In the offline stage, the system continuously harvests recently accepted peer‑reviewed papers from top‑tier venues (e.g., NeurIPS, ICLR) together with anonymized review comments, ratings, and confidence scores. Using a structured extraction pipeline, it parses each paper’s introduction, methods, and experiments sections to identify core methodological units—self‑contained descriptions of problem formulation, model architecture, learning objectives, and optimization strategies—while discarding low‑level implementation details. These units are embedded, reduced with UMAP, and clustered via DBSCAN to form coherent research patterns, which are then linked in a methodological knowledge graph that captures semantic and compositional relationships among methods.

During the online stage, a user’s informal research intent is embedded and aligned with existing patterns in the graph. The system retrieves the most relevant research patterns, composes compatible method units into a concrete research plan, and iteratively refines the plan through a review‑guided loop that evaluates novelty, methodological soundness, and coherence. The refined blueprint drives downstream modules for feasibility‑driven experimentation, code generation, and end‑to‑end paper drafting, ultimately producing a submission‑ready manuscript.

By grounding runtime reasoning in a pre‑built graph, Idea2Story alleviates the context‑window bottleneck of large language models, reduces repeated costly document processing, and mitigates hallucination risks. Preliminary qualitative case studies and limited quantitative evaluations demonstrate that the generated research narratives are more methodologically coherent and aligned with established scientific practices than those produced by purely runtime‑centric agents. However, the paper acknowledges several open challenges: the accuracy of method‑unit extraction, the need for frequent graph updates to stay current with emerging research, potential bias introduced by reviewer feedback, and the lack of large‑scale comparative benchmarks. Overall, Idea2Story proposes a promising pre‑computation‑driven paradigm that could scale autonomous scientific discovery, provided that future work addresses extraction robustness, graph maintenance, and comprehensive evaluation.


Comments & Academic Discussion

Loading comments...

Leave a Comment