CellForge: Agentic Design of Virtual Cell Models

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Virtual cell modeling aims to predict cellular responses to diverse perturbations but faces challenges from biological complexity, multimodal data heterogeneity, and the need for interdisciplinary expertise. We introduce CellForge, a multi-agent framework that autonomously designs and synthesizes neural network architectures tailored to specific single-cell datasets and perturbation tasks. Given raw multi-omics data and task descriptions, CellForge discovers candidate architectures through collaborative reasoning among specialized agents, then generates executable implementations. Our core contribution is the framework itself: showing that multi-agent collaboration mechanisms - rather than manual human design or single-LLM prompting - can autonomously produce executable, high-quality computational methods. This approach goes beyond conventional hyperparameter tuning by enabling entirely new architectural components such as trajectory-aware encoders and perturbation diffusion modules to emerge from agentic deliberation. We evaluate CellForge on six datasets spanning gene knockouts, drug treatments, and cytokine stimulations across multiple modalities (scRNA-seq, scATAC-seq, CITE-seq). The results demonstrate that the models generated by CellForge are highly competitive with established baselines, while revealing systematic patterns of architectural innovation. CellForge highlights the scientific value of multi-agent frameworks: collaboration among specialized agents enables genuine methodological innovation and executable solutions that single agents or human experts cannot achieve. This represents a paradigm shift toward autonomous scientific method development in computational biology. Code is available at https://github.com/gersteinlab/CellForge.

💡 Research Summary

CellForge introduces a novel multi‑agent framework that autonomously designs, implements, and evaluates neural network architectures for virtual cell modeling—a task that seeks to predict how cells respond to diverse perturbations such as gene knockouts, drug treatments, and cytokine stimulations. The authors argue that current approaches require extensive manual effort, deep interdisciplinary expertise, and often rely on adapting existing models (e.g., scGPT, Geneformer, GEARS) that may not capture dataset‑specific nuances. To address this, CellForge decomposes the workflow into three tightly coupled modules: Task Analysis, Design, and Experiment Execution.

The Task Analysis module first parses raw multimodal single‑cell data (scRNA‑seq, scATAC‑seq, CITE‑seq), extracting metadata, batch information, sparsity statistics, and other quantitative descriptors without human intervention. Simultaneously, a Literature Retrieval agent conducts a hybrid search strategy: it starts from a curated list of 46 seminal papers on single‑cell perturbation analysis and expands outward using breadth‑first and depth‑first traversals of PubMed, guided by Sentence‑BERT similarity scores. The retrieved corpus is distilled into design principles (e.g., the importance of latent trajectory preservation, the utility of graph‑based gene interaction modeling).

These principles feed into a suite of specialized experts—Dataset Expert, Model Architecture Expert, Training Expert, and a central Critic Agent. They engage in a graph‑structured discussion where each round consists of proposal, critique, and refinement steps. Confidence scores for each expert are dynamically updated based on historical performance, peer evaluations, and the Critic’s assessments. The process iterates until confidence thresholds are met, at which point a concrete architecture specification is produced. Notably, the system invents components that have not been explicitly described in the literature, such as a “trajectory‑aware encoder” that combines multi‑scale variational autoencoders with PCA‑derived latent alignment, and a “perturbation diffusion module” that propagates perturbation information through a configurable gene‑interaction GNN before integration with downstream decoders.

The Experiment Execution module translates the finalized specification into production‑ready Python/PyTorch code, complete with training loops, evaluation scripts, and Docker containers for reproducibility. An automated debugging loop monitors runtime errors, convergence issues, and resource bottlenecks, prompting the agents to adjust hyper‑parameters or replace problematic sub‑modules. The resulting models are trained on six publicly available perturbation datasets spanning gene knockouts (e.g., Norman et al. 2019 CRISPRi K562), drug treatments, and cytokine stimulations across multiple modalities. Performance is assessed using mean squared error, Pearson correlation, and a perturbation‑consistency metric adapted from prior work, ensuring both statistical and biological relevance.

Empirical results show that CellForge‑generated models achieve competitive or superior performance compared to strong baselines such as GEARS, scGPT, Geneformer, and CPA. On average, mean squared error improves by 5–12 % across datasets, with the most pronounced gains on highly sparse or batch‑affected data where the agents’ custom preprocessing pipelines and novel encoders mitigate noise. Ablation studies confirm that the newly invented modules contribute additional 3–8 % performance lifts beyond what can be obtained by recombining existing building blocks.

The paper’s contributions are twofold. First, it demonstrates that a coordinated ensemble of domain‑specific agents can automate the entire scientific method pipeline for computational biology—from data characterization through literature synthesis to method invention and code generation—thereby reducing reliance on expert intuition and manual engineering. Second, it provides concrete evidence that multi‑agent deliberation can generate genuinely novel methodological components, moving beyond hyper‑parameter search into the realm of autonomous methodological innovation. The authors discuss limitations, including stochastic variability across runs and the current focus on in‑silico model design, and outline future directions such as expanding the knowledge base, integrating wet‑lab experimental design agents, and applying the framework to other omics domains. Overall, CellForge represents a paradigm shift toward end‑to‑end autonomous research workflows in computational biology.

CellForge: Agentic Design of Virtual Cell Models

💡 Research Summary

Comments & Academic Discussion

Leave a Comment