RetroReasoner: A Reasoning LLM for Strategic Retrosynthesis Prediction
Retrosynthesis prediction is a core task in organic synthesis that aims to predict reactants for a given product molecule. Traditionally, chemists select a plausible bond disconnection and derive corresponding reactants, which is time-consuming and requires substantial expertise. While recent advancements in molecular large language models (LLMs) have made progress, many methods either predict reactants without strategic reasoning or conduct only a generic product analysis, rather than reason explicitly about bond-disconnection strategies that logically lead to the choice of specific reactants. To overcome these limitations, we propose RetroReasoner, a retrosynthetic reasoning model that leverages chemists’ strategic thinking. RetroReasoner is trained using both supervised fine-tuning (SFT) and reinforcement learning (RL). For SFT, we introduce SyntheticRetro, a framework that generates structured disconnection rationales alongside reactant predictions. In the case of RL, we apply a round-trip accuracy as reward, where predicted reactants are passed through a forward synthesis model, and predictions are rewarded when the forward-predicted product matches the original input product. Experimental results show that RetroReasoner not only outperforms prior baselines but also generates a broader range of feasible reactant proposals, particularly in handling more challenging reaction instances.
💡 Research Summary
RetroReasoner introduces a novel approach to single‑step retrosynthesis prediction that explicitly mirrors the strategic reasoning employed by human chemists. Traditional retrosynthesis relies on identifying a plausible bond‑forming event, disconnecting that bond to generate abstract fragments called synthons, and then mapping each synthon to a commercially available reagent (synthetic equivalent). Recent molecular large language models (LLMs) have treated retrosynthesis as a SMILES‑to‑SMILES translation problem, but most of them either output reactants without any intermediate reasoning or provide only generic product‑level analysis, leaving a logical gap between the decision to break a bond and the final reactant set.
To close this gap, the authors propose a two‑stage training pipeline. The first stage, Supervised Fine‑Tuning (SFT), uses a synthetic data generator named SyntheticRetro. For each reaction, SyntheticRetro extracts three categories of supporting information from the RXN SMILES: (1) direct‑usable information (atom‑mapped SMILES, functional groups, ring counts, stereochemistry statistics), (2) model‑predicted information (reaction center, bond order predictions from a forward model), and (3) rule‑derived information (heuristic mappings of synthons to reagents). These pieces are fed to a general‑purpose LLM (GPT‑oss‑20B) which is prompted to produce a structured reasoning narrative consisting of four steps: (R1) product analysis, (R2) identification of key substructures, (R3) strategic bond disconnection, and (R4) mapping of synthons to synthetic equivalents. Each step is wrapped in explicit XML‑like tags (<PRODUCT_INFO>, <CANDIDATE_STRUCTURE>, <STRATEGIC_BOND_DISCONNECTION>, <SYNTHETIC_EQUIVALENT>) and linked by natural‑language connectors (L12, L23, L34). For every reaction, fifteen distinct linking texts are generated, and during SFT the model sees a different variant each epoch, encouraging it to learn a diverse set of logical pathways rather than a single deterministic script.
The second stage applies Reinforcement Learning with Verifiable Reward (RLVR). After SFT, the model becomes a policy πθ that generates full reasoning texts and reactant SMILES given a product. The reward is defined as round‑trip accuracy: the predicted reactants are fed into a forward synthesis model; if the forward‑predicted product matches the original target, the reward is high. This reward is group‑normalized across G sampled outputs per query, and the policy is updated using Group Relative Policy Optimization (GRPO), a variant of PPO that stabilizes training when multiple valid outputs exist. By rewarding feasibility rather than exact match to a single reference, the RL stage encourages the model to propose alternative yet chemically plausible reactant sets, addressing the multi‑modal nature of retrosynthesis.
Extensive experiments were conducted on USPTO‑Full (≈100 k reactions), USPTO‑50K, and several out‑of‑distribution test sets enriched with rare atoms, rare functional groups, and uncommon n‑gram token patterns. RetroReasoner (both SFT and RL versions) consistently outperformed prior LLM‑based baselines such as Chem‑R, ChemDFM, Retro‑Expert, and other template‑free models. Metrics included top‑k exact‑match accuracy, feasible ratio (percentage of generated reactant sets that pass the round‑trip check), and diversity (average pairwise dissimilarity of proposals). RetroReasoner achieved 5–10 percentage‑point gains in top‑k accuracy and roughly doubled the feasible ratio compared with the strongest baseline. Notably, on challenging reaction classes (e.g., thioether formation, multi‑bond disconnections) where baseline models often failed completely, RetroReasoner still recovered correct reactants in 30–40 % of cases.
Beyond quantitative gains, the generated reasoning texts are human‑readable and follow a logical progression that a chemist could audit. The authors demonstrate examples where the model correctly identifies a carbonyl‑centered bond, proposes a synthon corresponding to an acyl chloride, and maps it to a commercially available acid chloride reagent, all while providing explanatory linking sentences. This interpretability distinguishes RetroReasoner from black‑box translation models and opens the door to interactive AI‑assisted synthesis planning.
The paper concludes with several avenues for future work: extending the framework to multi‑step retrosynthetic planning, incorporating reaction conditions (catalyst, solvent, temperature) into the reasoning, and integrating the system with laboratory automation platforms for closed‑loop synthesis. In summary, RetroReasoner showcases that coupling strategically structured reasoning data with a verifiable round‑trip reward can substantially improve both the accuracy and the chemical plausibility of LLM‑driven retrosynthesis, marking a significant step toward AI systems that think like chemists.
Comments & Academic Discussion
Loading comments...
Leave a Comment