MolMark: Safeguarding Molecular Structures through Learnable Atom-Level Watermarking
AI-driven molecular generation is reshaping drug discovery and materials design, yet the lack of protection mechanisms leaves AI-generated molecules vulnerable to unauthorized reuse and provenance ambiguity. Such limitation undermines both scientific reproducibility and intellectual property security. To address this challenge, we propose the first deep learning based watermarking framework for molecules (MolMark), which is exquisitely designed to embed high-fidelity digital signatures into molecules without compromising molecular functionalities. MolMark learns to modulate the chemically meaningful atom-level representations and enforce geometric robustness through SE(3)-invariant features, maintaining robustness under rotation, translation, and reflection. Additionally, MolMark integrates seamlessly with AI-based molecular generative models, enabling watermarking to be treated as a learned transformation with minimal interference to molecular structures. Experiments on benchmark datasets (QM9, GEOM-DRUG) and state-of-the-art molecular generative models (GeoBFN, GeoLDM) demonstrate that MolMark can embed 16-bit watermarks while retaining more than 90% of essential molecular properties, preserving downstream performance, and enabling >95% extraction accuracy under SE(3) transformations. MolMark establishes a principled pathway for unifying molecular generation with verifiable authorship, supporting trustworthy and accountable AI-driven molecular discovery.
💡 Research Summary
MolMark introduces the first deep‑learning based watermarking framework specifically designed for AI‑generated molecular structures. The authors identify three fundamental challenges that distinguish molecules from images or proteins: (i) extreme sparsity and limited redundancy at the atom level, (ii) high sensitivity of chemical bonds to even minute geometric perturbations, and (iii) the necessity for invariance under SE(3) transformations (rotation, translation, reflection) because molecular data are often processed in arbitrary coordinate frames. To address these issues, MolMark embeds a binary watermark into the 3‑dimensional coordinates of atoms while leaving the atom‑type and charge features untouched, thereby preserving the chemical identity of the molecule.
The system consists of a watermark encoder Eϕ and a decoder Dθ. The encoder is built from four modules: (1) a position‑processing module that projects raw atom positions into a high‑dimensional latent space, creating redundancy that can be safely exploited; (2) an atom embedder that combines one‑hot atom types with sinusoidal positional encodings of atomic charges, generating rich atom‑level descriptors; (3) an edge embedder that incorporates inter‑atomic distances and connectivity to produce edge features that encode bond information; and (4) a cross‑processing module that fuses the three feature streams with the binary watermark vector, produces a position mask, and adds it to the original coordinates to obtain the watermarked geometry G′. Importantly, only the positions p are altered, guaranteeing that the underlying graph (atoms, bonds, charges) remains chemically valid.
Training is guided by a composite loss that balances three objectives: (a) a reconstruction loss (L2 distance between original and watermarked positions) to keep geometric distortion minimal; (b) a property‑preservation loss that penalizes deviations in key molecular descriptors (bond lengths, angles, electronic properties, etc.); and (c) a cross‑entropy loss for accurate watermark recovery. The authors propose a “dynamically balanced” training schedule that automatically adjusts the relative weights of these losses, preventing the model from over‑optimizing one objective at the expense of the others.
Experiments are conducted on two benchmark datasets—QM9 (small organic molecules) and GEOM‑DRUG (drug‑like compounds)—and on two state‑of‑the‑art 3D generative models, GeoBFN and GeoLDM. A 16‑bit watermark is embedded in each molecule. Results show that after watermarking, atomic stability (the fraction of atoms that retain correct bonding) exceeds 97.6 % and overall molecular stability exceeds 94.6 %. Downstream tasks such as docking scores and quantum‑chemical properties remain virtually unchanged, demonstrating that the watermark does not impair functional performance. Moreover, under random SE(3) transformations the decoder recovers the embedded watermark with >95 % accuracy, confirming the robustness of the SE(3)‑invariant design.
A practical use‑case, illustrated as the “Alice‑Elaine” scenario, demonstrates how a data owner can distribute uniquely watermarked copies of a molecular library to different collaborators. If a leak occurs, the owner can extract the watermark from the leaked molecules and trace the source, providing a concrete mechanism for IP protection and data‑leak detection in high‑stakes domains such as pharmaceuticals.
In summary, MolMark delivers (1) a chemically safe method for embedding digital signatures at the atom level, (2) strong invariance to geometric transformations via SE(3)‑invariant feature engineering, and (3) seamless integration with modern 3D molecular generative pipelines. The work opens a new research direction at the intersection of cheminformatics, deep learning, and digital rights management, and suggests future extensions such as higher‑capacity watermarks, multi‑watermark schemes, and end‑to‑end pipelines that include experimental synthesis verification.
Comments & Academic Discussion
Loading comments...
Leave a Comment