Reproducibility in simulation-based computer architecture research requires coordinating artifacts like disk images, kernels, and benchmarks, but existing workflows are inconsistent. We improve gem5, an open-source simulator with over 1600 forks, and gem5 Resources, a centralized repository of over 2000 pre-packaged artifacts, to address these issues. While gem5 Resources enables artifact sharing, researchers still face challenges. Creating custom disk images is complex and time-consuming, with no standardized process across ISAs, making it difficult to extend and share images. gem5 provides limited guest-host communication features through a set of predefined exit events that restrict researchers' ability to dynamically control and monitor simulations. Lastly, running simulations with multiple workloads requires researchers to write custom external scripts to coordinate multiple gem5 simulations which creates error-prone and hard-to-reproduce workflows. To overcome this, we introduce several features in gem5 and gem5 Resources. We standardize disk-image creation across x86, ARM, and RISC-V using Packer, and provide validated base images with pre-annotated benchmark suites (NPB, GAPBS). We provide 12 new disk images, 6 new kernels, and over 200 workloads across three ISAs. We refactor the exit event system to a class-based model and introduce hypercalls for enhanced guest-host communication that allows researchers to define custom behavior for their exit events. We also provide a utility to remotely monitor simulations and the gem5-bridge driver for user-space m5 operations. Additionally, we implemented Suites and MultiSim to enable parallel full-system simulations from gem5 configuration scripts, eliminating the need for external scripting. These features reduce setup complexity and provide extensible, validated resources that improve reproducibility and standardization.
Reproducibility and usability remain major challenges in computer architecture research [1], particularly for simulationbased studies. These challenges are often framed in terms of the ACM's definitions of reproducibility levels: repeatability (same team, same setup), replicability (different team, same setup), and reproducibility (different team, different setup) [2], [28]. Achieving any level is difficult due to complex experimental workflows involving diverse toolchains, dependencies, and library configurations, which create substantial barriers to initial adoption and replication. Insufficient documentation of experimental workflows and execution environments further compromises transparency and reproducibility. The absence of standardized practices for sharing experimental artifacts affects verification and future research. A lack of version control also makes it difficult to trace the exact state of a workflow. These combined factors highlight the need for simulation infrastructure that is both robust and accessible, supported by well-documented, standardized methodologies to enable broader adoption and reliable reproduction of results.
We address these challenges within gem5, a widely-used open-source computer architecture simulator [7], [18]. gem5 is an event-driven, modular tool capable of simulating a diverse range of computer systems, from simple in-order processors to advanced out-of-order processors, as well as basic memory systems to complex memory hierarchies. gem5 supports both full-system simulation and syscall emulation (SE) mode, providing flexibility for various use cases. Beyond traditional CPU simulations, gem5 can model GPUs, network-on-chip (NoC) systems and emerging technology, broadening its applicability to different computing paradigms [11], [23], [27]. Written in C++ and Python, gem5 is highly configurable and extensible, supporting multiple ISAs, including x86, ARM, and RISC-V. Researchers use gem5 in fields such as computer architecture, operating systems, compilers, and security. In industry, gem5 enables rapid iteration in performance modeling, software development, and hardware design.
While gem5 offers a flexible simulation framework, setting up experiments using standard benchmarks and workloads can be complex, time-consuming and ad-hoc. In response, gem5Art and gem5 Resources were developed to enable reproducibility in gem5 [8]. gem5Art is a set of Python libraries that can be used to run gem5 experiments in a more structured way and allows storing results in a database for future use [8]. gem5 Resources is a centralized repository of artifacts that are not needed for gem5 to build but are used by researchers to run experiments. Since the release, gem5 Resources have been maintained and expanded. These resources include readyto-use benchmark suites (e.g., PARSEC [6], NAS Parallel Benchmarks (NPB) [4], GAP Benchmark Suite [5], VRG microbenchmarks [22]), workloads, and sampling techniques (e.g., checkpoints, SimPoints [26], [33], LoopPoints [29], [30]) provided in formats consumable by gem5.
While gem5Art and gem5 Resources have simplified artifact discovery and management, key usability and reproducibility gaps remain. In this paper, we address three such challenges through contributions that have been merged upstream in the gem5 v25.0 release:
• Workload Standardization: Inconsistent disk image configurations and creation processes can lead to variable results. We standardize disk image creation by adopting a unified Packer-based workflow for x86, ARM, and RISC-V that uses the same modern Ubuntu LTS release and identical configuration parameters. This standardization enables us to provide 12 new disk images, 6 new kernels, and over 200 widely used benchmarks across three ISAs. We also validate these resources so that researchers can directly plug them into their experiments. To address these challenges, we introduce a suite abstraction, which provides a structured way to define sets of workloads. We also add initial support for multi-simulation to streamline running multiple experiments directly from gem5 configuration scripts. Together, these features aim to simplify experiment setup and reduce reliance on external orchestration, helping make workflows more self-contained and reproducible.
gem5 is a widely used, cycle-level, open-source simulator for computer architecture research [7], [18]. With more than 1,600 GitHub forks and contributions from both academia and industry, it supports a broad range of systems, from simple inorder cores to complex out-of-order processors and advanced memory hierarchies.
Because architectural research typically requires evaluating designs across many workloads, gem5 enables simulation of diverse applications and benchmark suites in either syscall emulation (SE) mode or full-system (FS) mode. These workloads range from small microbenchmarks to being part of larger suites such as PARSEC [6], NAS Parallel Benchmarks (NPB) [4], and GA
This content is AI-processed based on open access ArXiv data.