AI-ready design of realistic 2D materials and interfaces with Mat3ra-2D
Artificial intelligence (AI) and machine learning (ML) models in materials science are predominantly trained on ideal bulk crystals, limiting their transferability to real-world applications where surfaces, interfaces, and defects dominate. We presen…
Authors: Vsevolod Biryukov, Kamal Choudhary, Timur Bazhirov
AI-ready design of realistic 2D materials and in terfaces with Mat3ra-2D Vsev olo d Biryuk ov ∗ Kamal Choudhary †‡ Tim ur Bazhirov ∗ § Abstract Artificial in telligence (AI) and machine learning (ML) models in materials science are predominan tly trained on ideal bulk crystals, limiting their transferability to real-w orld applications where surfaces, interfaces, and defects dominate. W e presen t Mat3ra-2D, an op en-source framework for rapid design of realistic t wo-dimensional materials and related structures, including slabs and heterogeneous interfaces, with support for disorder and defect- driv en complexity . The approach com bines: (1) w ell-defined standards for storing and ex- c hanging materials data with a modular implemen tation of core concepts and (2) transfor- mation w orkflows expressed as configuration-builder pipelines that preserve pro venance and metadata. W e implement t ypical structure generation tasks, suc h as constructing orientation- sp ecific slabs or strain-matc hing interfaces, in reusable Jupyter notebo oks that serv e as both in teractive documentation and templates for repro ducible runs. T o low er the barrier to adoption, w e design the examples to run in any web browser and demonstrate how to incor- p orate these developmen ts in to a web application. Mat3ra-2D enables systematic creation and organization of realistic 2D- and in terface-aw are datasets for AI/ML-ready applications. 1 In tro duction Mo dern materials research increasingly relies on digital data and automation to shorten the time from hypothesis to v alidated insigh t. Large-scale op en databases and w orkflow manage- men t systems hav e enabled high-throughput computational screening and accelerated discov ery across broad application domains [1 – 7]. At the same time, the pro cesses that gov ern materials prop erties in practice com bined with the rapid growth of atomically thin materials has created a need for datasets and to oling that go beyond ideal bulk crystals and capture the structures most relev ant to experiments and devices, including monola y ers, slabs, in terfaces and heterostructures with defects and disorder[8 – 13]. A p ersisten t limitation is that many mac hine-learning m odels and b enc hmark datasets are dominated by idealized three-dimensional crystals, while real-world p erformance often dep ends on surfaces, reconstructions, disorder, and defects, as w ell as heterogeneous interfaces that de- termine transp ort, catalysis, and stabilit y [14, 15]. Bridging this gap requires (i) systematic generation of realistic structures and (ii) a consistent representation of structure, transforma- tions, and prov enance suitable for databases and AI/ML pip elines. ∗ Exab yte Inc. (Mat3ra.com), W aln ut Creek, CA 94596, USA † Departmen t of Materials Science and Engineering, Johns Hopkins Universit y , Baltimore, MD 21218, USA ‡ Departmen t of Electrical and Computer Engineering, Johns Hopkins Univ ersity , Baltimore, MD 21218, USA § Corresp onding author: tim ur@mat3ra.com 1 This c hallenge is particularly acute for 2D and surface-dominated systems. In suc h cases, the relev ant scien tific degrees of freedom often arise from ho w a structure is created rather than from stoic hiometry alone: orien tation, termination, lay er count, v acuum spacing, strain state, stac king registry , defect type, and p ost-pro cessing steps can all alter the resulting material mo del. As a result, generating realistic structures for computation is not simply a matter of selecting a material from a database, but of constructing and do cumen ting a sequence of transformations that leads to the target geometry . As demonstrated in the recen t study[16], comparing 16 graph- based mac hine-learned force fields, the lac k of data on structural features relev ant to real-w orld semiconductor applications can hinder the mo del’s applicabilit y to in terfaces. Mat3ra-2D addresses these needs with an op en-source framew ork for practical design of realistic 2D materials and related structures, including slabs, in terfaces, defects, and other low- dimensional motifs relev an t to device modeling. The framew ork com bines shared data standards, reusable transformation logic, prov enance-a ware workflo ws, and example-driven access paths so that realistic structures can b e generated, insp ected, reused, and adapted across different compu- tational settings. In this sense, it builds on our prior w ork on accessible computational materials design [17], data-centric ecosystem design [18], practical categorization of computational mod- els [19], in terpretable materials ML [20], and, most directly , M-CODE as a categorization and on tology lay er for realistic structures [21]. Mat3ra-2D is part of Mat3ra.com, a collab orativ e platform for computational materials R&D [18], and is fo cused on organizing realistic structure generation as a reusable soft ware stac k spanning standards, common data, implementation pack ages, and interactiv e noteb o oks. The approac h is designed to interoperate with established Python to oling for materials analysis and atomistic modeling, including pymatgen and ASE [22, 23], while also supp orting bro wser-based execution for low er-friction access to representativ e w orkflows. The resulting fo cus on explicit structure-generation choices is aligned with our broader effort to make materials datasets and mo dels more traceable and reusable across the digital materials w orkflo w [19 – 21]. The emphasis of the presen t work is therefore practical as well as conceptual. W e fo cus not only on what kinds of realistic structures can b e represented, but also on how they can be created through reusable workflo ws, do cumen ted with explicit pro venance, and shared through op enly accessible noteb ooks. This combination is intended to supp ort b oth systematic dataset generation and more transparent comm unication of ho w realistic computational structures are built. The remainder of this man uscript is organized as follows. Section 2 outlines the o verall approac h and the main comp onen ts of the ecosystem. Section 3 summarizes representativ e out- comes and example workflo ws enabled by Mat3ra-2D. Section 4 discusses implications, practical considerations, and future directions. Section 5 concludes with a brief summary . 2 Metho dology 2.1 Ov erview Mat3ra-2D is organized as an implementation stack for practical generation of realistic struc- tures, such as slabs, surfaces, in terfaces, and defectiv e configurations, with explicit pro venance. Rather than presenting a single monolithic co debase, the methodology is distributed across stan- dards, common data, implementation lay ers, and usage examples, eac h with a distinct role in making realistic structure generation reusable and repro ducible. 2 2.2 Comp onen ts 2.2.1 Standards The low est lay er of the stack is the standards pack age mat3ra-esse [24, 25], whic h pro vides the sc hemas, ontology , categorization, and exchange format used throughout the ecosystem. Its role is to provid e a shared mac hine-readable representation for structures, inputs, and prov enance so that higher-level tools op erate on consisten t concepts. These standards, together with representativ e schema and JSON examples, are based on the M-CODE categorization and on tology describ ed in the corresp onding manuscript [21]. In the presen t w ork, we use that foundation as the substrate for realistic 2D structure generation rather than reintroducing the full categorization itself. 2.2.2 Commonly-used data The next la yer is mat3ra-standata [26], whic h distributes commonly used example materials and related records that adhere to the shared standards. In practice, this pack age serves as a reusable source of input structures for w orkflows and noteb ooks, allo wing examples to b egin from consistent reference materials rather than redefining them lo cally . 2.2.3 OOD abstraction lay er Ab o v e the standards la yer, mat3ra-code pro vides implementations of the abstract lev el of the standardized entities as Python and Jav aScript/T yp eScript classes. This approach is versed in the ob ject-orien ted design (OOD). This la yer is not the main user-facing entry p oin t for the presen t work; instead, it acts as an abstraction lay er used b y higher-level pac k ages. Its role is to k eep the common ”abstract” p ortions of the sp ecific concepts (such as materials, configurations, and transformations, describ ed in the subsequent section) consistent across the co debase and to reduce duplication when the same concepts are used in differen t tools. 2.2.4 Materials design implementation The main implementation lay er is mat3ra-made [27]. This pac k age con tains the concrete materials- design functionalit y used to construct realistic structures: material ob jects, builders, analyzers, and mo difiers for op erations such as slab generation, interface construction, strain handling, and defect creation. In practical terms , this is the la yer where standardized inputs become executable workflo ws. A key pattern in this lay er is prov enance-aw are transformation: materials are created or mo dified through explicit configurations and recorded build steps. Some transformations can b e expressed as a single build step, while others require staged w orkflo ws. F or example, strain- matc hed in terface generation can be expressed as Define–R efine–Build : define film and substrate slabs, refine by enumerating and ranking commensurate matc hes, and build the selected in terface while recording strain and relative shifts as metadata. Figure 1 summarizes this staged workflo w, and Section 3 provides a concrete Gr/Ni(001) example with API snipp ets. 2.2.5 Noteb o ok examples and bro wser access The user-facing examples are distributed through mat3ra-api-examples [28]. This la yer shows ho w the stac k is used in practice through reusable Jup yter noteb o oks. The collection includes 3 Figure 1: Interface transformation stages: define, refine, and build for selecting and constructing a strain-matched interface with recorded metadata. generic noteb o oks for individual transformations and m ulti-step workflo w noteb o oks that com- bine several op erations into more adv anced examples. These notebo oks serve b oth as documen- tation and as repro ducible templates for adaptation. The noteb o ok la yer is designed to op erate in conv en tional Python environmen ts and in bro wser-based runtimes. In particular, the same examples can b e executed through JupyterLite using Pyodide, enabling insp ection and execution without lo cal installation. This browser- accessible lay er is important for accessibilit y and dissemination, while the Results section sho ws the kinds of structures and noteb ook collections that can b e built on top of it. 2.3 Op en-source ecosystem approac h T aken together, these lay ers form an op en-source ecosystem rather than a single application. Standards, common data, implemen tation pac k ages, and noteb ooks are distributed separately so that they can b e reused indep endently or together, dep ending on the use case. This separation of concerns supp orts v alidation, in tegration in to other softw are systems, and gradual extension of the platform. W e therefore view Mat3ra-2D not only as a code pack age, but as a reusable stac k for realistic structure generation with prov enance. W e welcome communit y contributions that extend the sc hemas, reference data, transformation to oling, and noteb ook examples, and that broaden the co verage of realistic structure categories relev ant to mo deling and AI/ML applications. 3 Results 3.1 Build pro cedures Mat3ra-2D enables systematic generation of structures that are cen tral to applications but un- derrepresen ted in bulk-cen tric datasets. The images in T able 1 illustrate a num b er of represen- tativ e structures generated using the framework, spanning in terfaces, defects, reconstructions, and heterostructures. The build pro cedures implement typical design tasks via help er functions inside mat 3 r a − made [27]. W e demonstrate the approac h b elow. 4 3.1.1 Example 1: constructing slabs with sp ecific terminations (a) Bulk SrTiO 3 (b) (110) slab, SrTiO termination (c) (110) slab, O 2 termination Figure 2: SrTiO 3 (110) slab construction with termination control: (a) bulk crystal from the reference data rep ository , (b) and (c) alternativ e surface terminations obtained with the same Miller indices, lay er count, and v acuum b y changing the termination formula. Generation of orientation-specific slabs with explicit termination control is straigh tforward: for example, a SrTiO 3 (110) slab is created b y imp orting bulk SrTiO 3 from the reference data rep os- itory , then applying a slab builder with Miller indices (1,1,0), sp ecifying the num ber of lay ers, v acuum spacing, and the desired termination formula (e.g., ”SrTiO” or ”O 2 ”). Figure 2 illus- trates the bulk input and the tw o terminations. This enables systematic exploration of surface c hemistry by generating alternativ e terminations within the same workflo w and comparing their resulting structures and prop erties. 1 from mat3ra.standata.materials import Materials 2 from mat3ra.made.material import Material 3 from mat3ra.made.tools.build.slab import create_slab 4 5 bulk_srtio3 = Material ( Materials .get_by_name_first_match( "SrTiO3" )) 6 srtio3_slab = create_slab ( 7 crystal=bulk_srtio3, 8 miller_indices=(1, 1, 0), 9 termination_formula= "SrTiO" , 10 number_of_layers=4, 11 vacuum=10.0, 12 ) 3.1.2 Example 2: constructing in terfaces without strain matching Assem bly of film/substrate in terfaces without strain matc hing follows a define–build w orkflow. F or example, a Ge(001)/Si(001) interface is constructed b y defining b oth slabs indep enden tly (eac h with Miller indices (001), thic kness, and v acuum), then combining them with a target in terfacial gap (e.g., 1.2 ˚ A) and optional lateral shift. Figure 3 illustrates the t wo slabs and the assem bled interface. The resulting in terface structure includes metadata recording the source materials and transformation parameters. 1 from mat3ra.standata.materials import Materials 5 (a) Ge(001) slab (b) Si(001) slab (c) Ge/Si(001) interface Figure 3: Ge(001)/Si(001) in terface without strain matching: (a) and (b) film and substrate slabs defined indep enden tly with Miller indices (001), lay er coun t, and v acuum; (c) combined in terface after applying a target interfacial gap and optional lateral shift. 2 from mat3ra.made.material import Material 3 from mat3ra.made.tools.build.slab import create_slab 4 from mat3ra.made.tools.build.interface import create_interface_simple_between_slabs 5 6 bulk_ge = Material ( Materials .get_by_name_first_match( "Ge" )) 7 bulk_si = Material ( Materials .get_by_name_first_match( "Si" )) 8 9 film_slab = create_slab ( 10 crystal=bulk_ge, 11 miller_indices=(0, 0, 1), 12 termination_formula= "Ge" , 13 number_of_layers=2, 14 ) 15 16 substrate_slab = create_slab ( 17 crystal=bulk_si, 18 miller_indices=(0, 0, 1), 19 termination_formula= "Si" , 20 number_of_layers=2, 21 ) 22 23 interface = create_interface_simple_between_slabs( 24 substrate_slab=substrate_slab, 25 film_slab=film_slab, 26 gap=1.2, 27 ) 6 3.1.3 Example 3: strain-matc hed in terfaces F or interfaces requiring commensurate matc hing, the framework provides a define–refine–build w orkflow. Figure 4 illustrates this pro cess for constructing a Graphene/Ni(001) in terface. Define: Starting from bulk materials (Figures 4a–b), the film (graphene monola yer) and substrate (Ni(001) slab) configurations are created indep enden tly . Bulk crystals are imp orted from reference data and wrapp ed as materials, after whic h a slab builder is applied using Miller indices, la yer counts, v acuum spacing, and termination formulas. The graphene film is a single- la yer structure with (001) orien tation, while the Ni substrate is a (001) slab with m ultiple la yers. Refine: A ZSL (Zur and McGill sup erlattice) analyzer en umerates commensurate matc hes b y finding sup ercell com binations that minimize strain while con trolling interface area. Figure 4c sho ws the strain–size trade-off plot, where each p oin t represents a candidate configuration rank ed b y strain p ercen tage and num b er of atoms (interface area). The analyzer returns multiple rank ed options, allo wing selection based on the desired balance b et ween strain minimization and computational cost (system size). Build: The selected configuration is passed to the interface builder, which applies the su- p ercell transformations to the film and substrate slabs, enforces the chosen interfacial gap and lateral shift, adds v acuum if requested, and records strain and relativ e shift as metadata. Fig- ure 4d shows the resulting Graphene/Ni(001) interface structure with commensurate matc hing applied. In practice, bulk crystals are retrieved from the reference data rep ository b y name (e.g., “Graphene” and “Nick el”) and wrapp ed as materials for downstream building and analysis (Figure 4a–b). 1 from mat3ra.standata.materials import Materials 2 from mat3ra.made.material import Material 3 4 bulk_graphene = Material ( Materials .get_by_name_first_match( "Graphene" )) 5 bulk_nickel = Material ( Materials .get_by_name_first_match( "Nickel" )) The film and substrate slabs are then defined indep enden tly by applying a slab builder with Miller indices, termination form ula, la yer coun t, and v acuum spacing. In this example, graphene is defined as a single la yer with C termination, while nic k el is defined as a (001) slab with Ni termination and 3 lay ers. 1 from mat3ra.made.tools.build.slab import create_slab 2 3 film_slab = create_slab ( 4 crystal=bulk_graphene, 5 miller_indices=(0, 0, 1), 6 termination_formula= "C" , 7 number_of_layers=1, 8 vacuum=10.0, 9 ) 10 11 substrate_slab = create_slab ( 12 crystal=bulk_nickel, 13 miller_indices=(0, 0, 1), 14 termination_formula= "Ni" , 15 number_of_layers=3, 16 vacuum=10.0, 17 ) 7 (a) Graphene monolay er (b) Ni(001) slab (c) Strain–size trade-off (d) Selected Gr/Ni(001) interface Figure 4: Graphene/Ni(001) in terface construction w orkflow: (a) Starting graphene monola yer film, (b) Ni(001) slab substrate, (c) ZSL analyzer output showing strain–size trade-off for candi- date configurations (eac h p oint represents a commensurate match rank ed by strain p ercen tage and in terface area), (d) Selected in terface s tructure with commensurate matc hing applied. The w orkflow follows define–refine–build: define film and substrate slabs, refine by enumerating and ranking matches, then build the selected configuration with recorded metadata. T o refine the interface, a ZSL analyzer enumerates commensurate sup ercell matches sub ject to a maxim um area constraint and returns a ranked list of candidate configurations (Figure 4c). 8 The top-rank ed candidate provides a strained film configuration paired with a matching substrate configuration. 1 from mat3ra.made.tools.build.interface import ZSLInterfaceAnalyzer 2 3 analyzer = ZSLInterfaceAnalyzer ( 4 film=film_slab, 5 substrate=substrate_slab, 6 max_area=50.0, 7 ) 8 configurations = analyzer.get_configurations() 9 strained_film, substrate = configurations[0] Finally , the selected configurations are passed to an interface builder that constructs the com bined in terface structure while enforcing an in terfacial gap, v acuum spacing, and an optional lateral shift for the film (Figure 4d). 1 from mat3ra.made.tools.build.interface import create_interface 2 3 interface = create_interface ( 4 film_configuration=strained_film, 5 substrate_configuration=substrate, 6 gap=2.1, 7 vacuum=10.0, 8 xy_film_shift=[0.0, 0.0], 9 ) This workflo w enables systematic exploration of interface configurations while maintaining pro venance of the matching process, enabling repro ducible dataset generation and transparent rep orting of generation parameters. 3.2 Jup yter noteb o oks demonstrating the usage The structural build pro cedures are demonstrated and shared as reusable Jupyter noteb o oks that serv e as b oth interactiv e documentation and repro ducible templates. The noteb o oks are written in an in teractive and editable wa y that accepts m ultiple input parameters, summarized at the top, and editable b y users. The inputs include the initial materials structures (e.g. - film and substrate for interface creation), and the parameters of the builder and subsequen t downs- election. These noteb ooks demonstrate ho w to use individual transformations (e.g., slab cre- ation, in terface construction, passiv ation). These noteb ooks are designed to b e educational and adaptable: they include functionalit y for loading materials from the reference data rep ository , previewing structures, saving results, and provide default settings and parameters. Users can run these noteb ooks to understand the transformation approach, then adjust parameter v alues to create their desired structures. Each noteb o ok fo cuses on a single transformation t yp e, mak- ing it easy to learn and mo dify . The noteb o oks are a v ailable online at jupyterlite.mat3ra.com through a browser-based JupyterLite en vironment, making the examples directly insp ectable and executable in any mo dern web browser without local installation or environmen t setup. Eac h notebo ok is annotated with an M-CODE tag [21] indicating the corresp onding structure category . The list of noteb o oks is presen ted in T able 1. 9 Figure 5: The Jup yterLite en vironment hosting the transformation noteb ooks. The left panel sho ws the file system bro wser with the noteb o oks listed, the righ t panel shows the editor with the Introduction notebo ok op en. The Introduction notebo ok contains the index and allows to na vigate to the other ones. The noteb o oks are organized b y the M-CODE[21] tag for quick na vigation. 10 Figure 6: Example material transformation noteb o ok op ened in JupyterLite. The noteb o ok exp oses editable input cells and executable workflo w steps so that users can mo dify parameters, run the build pro cedure (via Run > Run Al l Cel ls men u item), and insp ect intermediate results directly in the bro wser. The example sho ws the in terface construction workflo w. The input parameters con tain the film and substrate slab configurations, the in terfacial gap, and the lateral shift. The settings for the strain matching algorithm - e.g. the maxim um area constrain t - are also shown. 11 Figure 7: Final output pro duced by an example notebo ok in Jup yterLite. The resulting struc- ture can b e preview ed after execution, illustrating ho w the noteb ooks serve as interactiv e and repro ducible templates for realistic structure generation. The example shows the final result of the Gr/Ni(001) in terface construction workflo w. The final structure is sav ed in the uplo ads folder and can be loaded bac k in to the (other) noteb ook(s) en vironment for further analysis or reuse. 12 T able 1: The list of noteb o oks for material transformations a v ailable at jupyterlite.mat3ra.com. The noteb ooks are organized according to the M-CODE[21] tag corresp onding to the resulting structures pro duced. The Name column includes the name of the notebo ok. Notes explain the (adjustable b y user) input parameters and the example structure sho wn in the last column. M-CODE Name Notes Example P-3D-CRY Load Material from Standata Input: material iden tifier from the reference rep ository . Result: material structure. Example: Ni bulk. P-3D-CRY Create Sup ercell Inputs: bulk crystal and replication matrix. Result: crystal supercell. Example: Si 3x3x3 sup ercell. P-2D-MNL Create Monolay er Input: lay ered bulk crystal and clea v age settings. Result: isolated 2D monolay er. Example: graphene monola yer. P-2D-SLB-S Create Slab Input: bulk crystal, Miller indices, lay ers, v acuum, termination. Result: a slab structure. Example: SrTiO 3 (110) slab with SrTiO termination. 13 M-CODE Name Notes Example P-1D-NWR Create Nanowire Input: bulk crystal, orientation, and cross-section settings. Result: finite-width nano wire. Example: Si(001) nano wire. P-0D-NRB Create Nanoribb on Input: monolay er, cut direction, length, and width. Result: finite nanoribbon. Example: graphene armc hair nanoribb on. P-0D-NPR Create Cluster Input: crystal or slab plus shap e parameters. Result: finite nanoparticle or cluster. Example: Au nanoparticle. C-2D-HST Create Heterostructure Input: multiple la yers, Miller indices, thickness, and stacking order. Result: lay ered heterostack. Example: Si/SiO 2 /HfO 2 /TiN heterostac k. C-2D-INT-S Create Interface with no strain-matc hing Input: film and substrate slabs, gap, and shift. Result: directly assem bled in terface. Example: Ge(001)/Si(001) in terface. 14 M-CODE Name Notes Example C-2D-INT-Z Create Interface with strain matc hing ZSL Input: film and substrate slabs plus ZSL matching constraints. Result: commensurate in terface. Example: Graphene/Ni(001) in terface. C-2D-INT-T Create twisted in terface with commensurate lattices Input: tw o lay ers, twist angle, and matching settings. Result: twisted in terface. Example: twisted MoS2/WS2 in terface. D-2D-ADA Create adatom defect Input: surface slab, adsorption co ordinates, and adatom sp ecies. Result: slab with adatom defect. Example: Li adatom on Graphene. D-2D-ISL Create Island defect Input: parent surface and island geometry parameters. Result: slab with surface island. Example: TiN island on TiN(001) surface. D-2D-TER Create a terrace Input: slab and terrace cut orien tation. Result: stepp ed surface mo del. Example: Pt(210) terrace. 15 M-CODE Name Notes Example D-2D-GBP Create a grain b oundary Input: tw o grains and b oundary matc hing parameters. Result: planar grain b oundary . Example: Si(001)/(111) grain b oundary . D-1D-GBL Create a Grain Boundary in 2D material Input: 2D lattice, misorien tation, and join settings. Result: line defect in a 2D sheet. Example: grain boundary in hBN at 9 degrees. D-0D-VAC Create p oin t defect in a slab Input: slab, target site, and remo v al op eration. Result: v acancy defect. Example: v acancy in Graphene. D-0D-SUB Create p oin t defect in a slab Input: slab, target site, and replacemen t species. Result: substitutional defect. Example: Mg substitution in GaN. D-0D-INT Create p oin t defect in a slab Input: slab, insertion site, and added sp ecies. Result: interstitial defect. Example: interstitial O in SnO. 16 M-CODE Name Notes Example X-2D-PAS P assiv ate slab Input: slab, surface sites, and passiv ating sp ecies. Result: passiv ated surface. Example: H-passiv ated Cu(001) surface. X-2D-PER Add Sine P erturbation to a slab Input: slab and p erturbation amplitude or wa velength. Result: distorted surface structure. Example: Sine perturbation in Graphene sheet. X-1D-PAS P assiv ate nanoribb on Input: nanoribb on edge sites and passiv ating sp ecies. Result: edge-passiv ated ribb on. Example: H-passiv ated graphene armchair nanoribbon. X-0D-CUT Create b o x-cutout Input: slab and b o x geometry for carving. Result: finite cutout structure. Example: b o x cutout in GaN(001) slab. The same exact workflo ws can b e executed in standard Python environmen ts (Jup yterLab, IPython) or directly in a w eb browser via Pyodide, lo w ering the setup and getting-started barriers while remaining in terop erable with established to olkits suc h as p ymatgen and ASE [22, 23]. The w eb-based execution enables in tegration into w eb applications such as the Materials Designer platform, facilitating data exchange b et w een Python and web/Ja v aScript environmen ts. 17 4 Discussion Mat3ra-2D is motiv ated b y a practical constraint in data-driv en materials mo deling: the struc- tures most relev an t to real devices and pro cesses are rarely ideal bulk crystals. By making the generation of realistic slabs and in terfaces explicit and repro ducible, the framew ork supp orts the creation of training and b enc hmark datasets that b etter reflect the target deploymen t domain. 4.1 Pro cedure-orien ted design and reuse A cen tral design c hoice in Mat3ra-2D is to treat realistic structure generation as a composition of reusable workflo w steps rather than as a collection of isolated scripts. Slab generation, interface construction, defect insertion, passiv ation, and other op erations are exp osed through builders, analyzers, and reference data that can be com bined in a controlled wa y . This has practical consequences for maintainabilit y and extension: once a workflo w for a representativ e case is defined, the same comp onen ts can b e reused across noteb o ok examples, Python en vironments, and browser-based execution without redefining the concepts eac h time. This w orkflow-orien ted structure also helps connect different la y ers of the ecosystem. The same transformation logic that app ears in an educational noteb ook can b e reused in a scripted dataset-generation workflo w or embedded in a larger platform context. As the n umber of target structures grows, this reuse b ecomes increasingly imp ortan t b ecause it reduces duplication and k eeps pro venance handling, parameter conv en tions, and structure-building b eha vior consistent across examples. 4.2 Repro ducibilit y and op en access A second contribution of Mat3ra-2D is the combination of prov enance-a ware generation with op enly accessible execution environmen ts. Prov enance records capture the sequence of c hoices used to create a structure, including orien tation, termination, thickness, strain, registry , and in termediate ranking criteria. This makes w orkflows insp ectable and rerunnable rather than opaque. F or realistic structures, where small mo deling choices can produce meaningfully different systems, that transparency is essen tial. The noteb o ok collections mak e this principle concrete. Generic noteb o oks exp ose individual transformations in a form that can b e reused and adapted, while more adv anced notebo oks sho w how m ultiple transformations can b e c hained into longer structure-generation workflo ws. Because the examples are av ailable through JupyterLite, they can b e op ened and executed in a w eb browser without lo cal setup. This low ers the barrier to scrutiny and reuse and supp orts one of the central aims of the pro ject: shared structure-generation workflo ws should b e repro ducible b y others, not only by their original authors. 4.3 Implications for data and AI/ML Dataset qualit y is shap ed not only b y the underlying electronic-structure metho d but also b y ho w structures are defined, transformed, and curated. Many widely use d datasets and b enc h- marks emphasize structures that are easy to en umerate and standardize, esp ecially ideal bulk crystals, whic h can lead to mo dels that perform well on in-distribution tests but fail to transfer to the heterogeneous and defect-rich structures that dominate electronic devices and pro cessing conditions [14, 15]. F or 2D materials, relev ant targets frequently in volv e slabs, reconstructions, p oin t and extended defects, and film/substrate con tacts, where b eha vior can b e controlled by 18 terminations, strain, stac king registry , and in terface c hemistry rather than b y bulk stoic hiometry alone [8, 9]. Within this con text, Mat3ra-2D helps make realistic structure generation part of the model- ing problem rather than a hidden prepro cessing step. When structures carry explicit pro v enance, datasets can b e refined in a targeted wa y , category-sp ecific failures can b e diagnosed more sys- tematically , and training or b enc hmark collections can b e regenerated as assumptions evolv e. In that sense, the framework complements broader efforts to assess first-principles repro ducibilit y [29] and to connect ML behavior to interpretable materials represen tations [20]. It also comple- men ts computational materials infrastructure that has accelerated high-throughput discov ery through shared workflo ws and data platforms [1 – 5]. 4.4 Practical considerations and limitations The practical v alue of a framework such as Mat3ra-2D dep ends on ho w it is used to navigate an in trinsically large and unev en design space. Realistic 2D materials are combinatorial not only b ecause there are man y paren t compounds, but also because eac h target structure in tro duces ad- ditional mo deling choices. Interfaces require lattice matc hing, strain control, gap selection, and lateral registry decisions. Surfaces require c hoices of orientation, thic kness, v acuum, and termi- nation, and in some cases, reconstruction. Defect and disorder mo dels in tro duce site selection, concen tration, c hemical iden tit y , and symmetry considerations. Ev en when every transformation is individually well defined, the num b er of plausible combinations can gro w rapidly . This clarifies the scop e of the presen t work. Mat3ra-2D pro vides mac hinery to define, build, annotate, and share realistic structures, but it do es not remov e the scientific judgment required to decide which structures are representativ e, which intermediate candidates should b e selected, or which fidelity level is appropriate for downstream simulation. Repro ducible generation is not the same as v alidated generation: a structure can b e regenerated exactly and still b e a p o or appro ximation to the exp erimen tal system if the wrong termination, reconstruction, or defect mo del was chosen. The framework, therefore, impro ves insp ectabilit y and reuse, but it do es not replace b enc hmark design, conv ergence studies, or comparison with exp erimen t. 4.5 Op en-source developmen t and in terop erabilit y Mat3ra-2D is intended to complemen t, rather than replace, existing computational materials infrastructure. Data platforms and workflo w systems hav e already enabled large-scale computa- tion and data exchange, and the presen t framework fo cuses on the realistic structure-generation la yer that connects input concepts, workflo w steps, and reusable examples. This p ositioning is imp ortan t for adoption: practical use will dep end on robust conv erters, v alidators, database connectors, and reference-data pip elines that streamline the path from structure generation to calculation, storage, and reuse. The op en-source ecosystem is equally imp ortan t. Schemas, reference data, code, and note- b ooks are most useful when they can b e insp ected, adapted, and extended by others. Op en access to the noteb o oks is particularly v aluable b ecause it allo ws users to mo ve from reading ab out a workflo w to running it directly in the bro wser or reusing it in a conv en tional Python en vironment. In this sense, openness is not only a distribution choice; it is part of the metho d b y which w orkflows become auditable, reusable, and easier to in tegrate in to other to olchains. 19 4.6 F uture directions Sev eral directions are imp ortant for realizing the full p oten tial of realistic structure genera- tion with prov enance. One is v alidation: systematically generated interfaces, surfaces, and de- fect mo dels should b e compared against exp erimen tal observ ations and trusted computational b enc hmarks to iden tify whic h classes of approximations are reliable and which require refine- men t. Another is scalable dataset design: b ecause realistic structure spaces are to o large to en umerate exhaustiv ely , useful progress will likely come from b enc hmark suites, agreed rep ort- ing conv en tions, and selective sampling strategies rather than brute-force cov erage. Broader access through open noteb ooks and bro wser execution also creates opportunities b e- y ond b enc hmarking. Educational examples, surface-specific property datasets, transfer-learning studies from bulk to lo w-dimensional systems, and in verse-design lo ops in which target prop- erties guide structure generation all b ecome easier to organize when realistic structures can b e generated, inspected, and regenerated through shared workflo ws. The long-term v alue of Mat3ra-2D will dep end on ho w well these future directions com bine acc essibilit y with scientific rigor, allowing realistic structure generation to b ecome b oth easier to use and easier to trust. 5 Conclusion This w ork presented Mat3ra-2D as a framework for constructing realistic 2D materials and related structures with explicit, machi ne-readable prov enance. Building on the M-CODE on- tology and categorization framew ork [21], we illustrated represen tative outcomes and workflo ws for interfaces and other low-dimensional structures, and introduced noteb ook-based examples that support repro ducible structure generation. A central idea of the pro ject is that published structures should b e repro ducible by an yone from op enly av ailable online notebo oks. By prior- itizing realism, prov enance, accessibility , and reuse, Mat3ra-2D aims to enable dataset design and mo deling workflo ws that b etter reflect device-relev ant materials complexit y and improv e the practical utility of AI/ML mo dels. Ac kno wledgemen ts This work was supp orted in part by NIST 70NANB24H205. W e used large language mo dels (LLMs) to assist with drafting and editing portions of this man uscript. All con ten t was review ed and edited by the authors, who take full resp onsibilit y for the final text. Data and Soft w are a v ailability M-CODE and the asso ciated schemas are maintained as op en-source data standards. The reference implemen tation is accessible as a pack age on GitHub and distributed via PyPI as mat3ra-esse . Related ecosystem pac k ages and noteb o oks are a v ailable on GitHub and PyPI, including mat3ra-standata , mat3ra-made , and mat3ra-api-examples [24, 26 – 28]. References [1] An ubhav Jain, Shyue Ping Ong, Geoffroy Hautier, W ei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, et al. Com- 20 men tary: The materials pro ject: A materials genome approach to accelerating materials inno v ation. Apl Materials , 1(1):011002, 2013. [2] Stefano Curtarolo, W ahyu Sety a wan, Shidong W ang, Junk ai Xue, Kesong Y ang, Ric hard H T aylor, Lance J Nelson, Gus L W Hart, Stefano Sanvito, Marco Buongiorno-Nardelli, et al. Aflo wlib. org: A distributed materials prop erties rep ository from high-throughput ab initio calculations. Computational Materials Scienc e , 58:227–235, 2012. [3] James E Saal, Scott Kirklin, Muratahan Aykol, Bryce Meredig, and Christopher W olverton. Materials design and disco very with high-throughput density functional theory: the op en quan tum materials database (o qmd). Jom , 65(11):1501–1509, 2013. [4] Gio v anni Pizzi, Andrea Cep ellotti, Riccardo Sabatini, Nicola Marzari, and Boris Kozin- sky . Aiida: automated in teractiv e infrastructure and database for computational science. Computational Materials Scienc e , 111:218–230, 2016. [5] NOMAD lab or atory: A Eur op e an Centr e of Exc el lenc e , 2018. URL https://www. nomad- coe.eu/ . [6] Kamal Choudhary , Kevin F Garrity , Andrew CE Reid, Brian DeCost, Adam J Biacchi, Angela R High t W alk er, Zac hary T rautt, Jason Hattric k-Simp ers, A Gilad Kusne, Andrea Cen trone, et al. The joint automated rep ository for v arious in tegrated simulations (jarvis) for data-driven materials design. npj c omputational materials , 6(1):173, 2020. [7] Jaeh yung Lee, Justin Ely , Kent Zhang, Aksha ya Ajith, Charles Rh ys Campb ell, and Kamal Choudhary . Agapi-agents: An op en-access agen tic ai platform for accelerated materials design on atomgpt. org. arXiv pr eprint arXiv:2512.11935 , 2025. [8] Sten Haastrup, Mikkel Strange, Mohnish Pandey , Thorsten Deilmann, Per S Sc hmidt, Nic ki F Hinsche, Morten N Gjerding, Daniele T orelli, Peter M Larsen, Anders C Riis- Jensen, et al. The computational 2d materials database: High-throughput mo deling and disco very of atomically thin crystals. arXiv pr eprint arXiv:1806.03173 , 2018. [9] Filip A Rasmussen and Kristian S Th ygesen. Computational 2d materials database: elec- tronic structure of transition-metal dichalcogenides and oxides. The Journal of Physic al Chemistry C , 119(23):13169–13183, 2015. [10] Kiran Mathew, Arunima K Singh, Josh ua J Gabriel, Kamal Choudhary , Susan B Sinnott, Alb ert V Da vydov, F rancesca T av azza, and Richard G Hennig. Mpin terfaces: A materi- als pro ject based python to ol for high-throughput computational screening of interfacial systems. Computational Materials Scienc e , 122:183–190, 2016. [11] Kamal Choudhary , Irina Kalish, Ry an Beams, and F rancesca T av azza. High-throughput iden tification and c haracterization of t w o-dimensional materials using density functional theory . Scientific r ep orts , 7(1):5179, 2017. [12] Kamal Choudhary , Kevin F Garrity , Stev en T Hartman, Ghansh yam Pilania, and F rancesca T av azza. Efficien t computational design of tw o-dimensional v an der waals heterostructures: Band alignmen t, lattice mismatch, and machine learning. Physic al R eview Materials , 7(1): 014009, 2023. 21 [13] Kamal Choudhary and Kevin F Garrit y . In termat: accelerating band offset prediction in semiconductor interfaces with dft and deep learning. Digital Disc overy , 3(7):1365–1377, 2024. [14] Logan W ard, Ankit Agraw al, Alok Choudhary , and Christopher W olverton. A general- purp ose mac hine learning framework for predicting prop erties of inorganic materials. Npj Computational Materials , 2:16028 EP –, Aug 2016. URL http://dx.doi.org/10.1038/ npjcompumats.2016.28 . Article. [15] Olexandr Isay ev, Corey Oses, Cormac T oher, Eric Gossett, Stefano Curtarolo, and Alexan- der T ropsha. Universal fragment descriptors for predicting prop erties of inorganic crystals. Natur e Communic ations , 8:15679 EP –, Jun 2017. URL http://dx.doi.org/10.1038/ ncomms15679 . Article. [16] Daniel Wines and Kamal Choudhary . Chips-ff: Ev aluating universal mac hine learning force fields for material prop erties. ACS Materials L etters , 7(6):2105, 2025. [17] Protik Das, Mohammad Mohammadi, and Timur Bazhirov. Accessible computational ma- terials design with high fidelit y and high throughput. arxiv.or g/abs/1807.05623 , 2018. [18] T. Bazhiro v. Data-cen tric online ecosystem for digital materials science. arXiv pr eprint arXiv:1902.10838 , 2019. URL . [19] A. Zec h and T. Bazhiro v. Catecom: A practical data-cen tric approach to categorization of computational mo dels. Journal of Chemic al Information and Mo deling , 62(5):1268–1281, 2022. doi: 10.48550/arXiv.2109.13452. URL https://doi.org/10.48550/arXiv.2109. 13452 . [20] J. Dean, M. Sc heffler, T. A. R. Purcell, S. V. Barabash, R. Bhowmik, and T. Bazhirov. In terpretable mac hine learning for materials design. Journal of Materials R ese ar ch , 38: 4477–4496, 2023. doi: 10.1557/s43578- 023- 01164- w. URL https://doi.org/10.1557/ s43578- 023- 01164- w . [21] Vsev olo d Biryuko v, Kamal Choudhary , and Tim ur Bazhiro v. M-co de: Materials catego- rization via on tology , dimensionalit y and evolution. arXiv pr eprint arXiv:2602.14384 , 2026. URL . [22] Sh yue Ping Ong, William Davidson Richards, Anubha v Jain, Geoffroy Hautier, Michael Ko c her, Shrey as Cholia, Dan Gun ter, Vincen t L Chevrier, Kristin A Persson, and Gerbrand Ceder. Python materials genomics (p ymatgen): A robust, op en-source python library for materials analysis. Computational Materials Scienc e , 68:314–319, 2013. [23] Ask Hjorth Larsen, Jens Jørgen Mortensen, Jak ob Blomqvist, Iv ano E Castelli, Rune Chris- tensen, Marcin Du lak, Jesp er F riis, Michael N Grov es, Bjørk Hammer, Cory Hargus, et al. The atomic simulation en vironment—a python library for working with atoms. Journal of Physics: Condense d Matter , 29(27):273002, 2017. [24] mat3r a-esse . Mat3ra.com, 2026. URL https://pypi.org/project/mat3ra- esse/ . [25] @mat3r a/esse . Mat3ra.com, 2026. URL https://www.npmjs.com/package/@mat3ra/ esse . 22 [26] mat3r a-standata . Mat3ra.com, 2026. URL https://pypi.org/project/ mat3ra- standata/ . [27] mat3r a-made . Mat3ra.com, 2026. URL https://pypi.org/project/mat3ra- made/ . [28] mat3r a-api-examples . Mat3ra.com, 2026. URL https://pypi.org/project/ mat3ra- api- examples/ . [29] Kurt Lejaeghere, Gusta v Bihlmay er, T orb j¨ orn Bj¨ orkman, P eter Blaha, Stefan Bl ¨ ugel, V olker Blum, Damien Caliste, Iv ano E. Castelli, Stewart J. Clark, Andrea Dal Corso, Stefano de Gironcoli, Thierry Deutsch, John Ka y Dewhurst, Igor Di Marco, Claudia Draxl, Marcin Du lak, Olle Eriksson, Jos ´ e A. Flores-Liv as, Kevin F. Garrity , Luigi Geno vese, Paolo Gian- nozzi, Matteo Gian tomassi, Stefan Goedeck er, Xa vier Gonze, Oscar Gr ˚ an¨ as, E. K. U. Gross, Andris Gulans, F ran¸ cois Gygi, D. R. Hamann, Phil J. Hasnip, N. A. W. Holzw arth, Diana Iu ¸ san, Dominik B. Jo ch ym, F ran¸ cois Jollet, Daniel Jones, Georg Kresse, Klaus Ko ep ernik, Emine K ¨ u¸ c ¨ ukb enli, Y aroslav O. Kv ashnin, Ink a L. M. Lo c h t, Sv en Lub ec k, Martijn Mars- man, Nicola Marzari, Ulrike Nitzsc he, Lars Nordstr¨ om, T aisuke Ozaki, Lorenzo Paulatto, Chris J. Pick ard, W ard Poelmans, Matt I. J. Prob ert, Keith Refson, Man uel Rich ter, Gian- Marco Rignanese, San tanu Saha, Matthias Scheffler, Martin Schlipf, Karlheinz Sc hw arz, Sangeeta Sharma, F rancesca T av azza, P atrik Th unstr¨ om, Alexandre Tk atc henko, Marc T or- ren t, David V anderbilt, Michiel J. v an Setten, V eronique V an Sp eybroeck, John M. Wills, Jonathan R. Y ates, Guo-Xu Zhang, and Stefaan Cottenier. Repro ducibilit y in densit y func- tional theory calculations of solids. Scienc e , 351(6280), 2016. ISSN 0036-8075. doi: 10.1126/ science.aad3000. URL http://science.sciencemag.org/content/351/6280/aad3000 . 23
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment