Towards single-shot coherent imaging via overlap-free ptychography

T ow a rds s in g l e - s h o t co h e r e n t im a g in g v i a ove r l a p - f r e e pt y ch o g r a p hy O . H O I D N , 1 , * A . M I S H R A , 1 S . H E N K E , 2 A . V O N G , 2 A N D M . S E A B E R G 1 1 SL A C National A cceler ator Labor atory, Menlo P ark, California 94025, USA 2 Argonne N ational Labor atory, Lemont, Illinois 60439, USA * ohoidn@slac.stanf ord.edu Abstract: Pty chographic imaging at synchrotron and XFEL sources requires dense o ver lapping scans, limiting throughput and increasing dose. Extending coherent diﬀractiv e imaging to ov erlap- free operation on e xtended samples remains an open problem. Here, we e xtend Ptyc hoPINN (O. Hoidn et al. , Scientiﬁc Reports 13 , 22789, 2023) to deliv er ov erlap-fr ee, sing le-shot reconstructions in a Fresnel coherent diﬀraction imaging (CDI) geometry while also accelerating con v entional multi-shot pty chograph y . The frame w ork couples a diﬀerentiable f or ward model of coherent scattering with a Poisson photon-counting likelihood; real-space o v erlap enters as a tunable parameter via coordinate-based grouping rather than a hard requirement. On synthetic benc hmarks, recons tructions remain accurate at lo w counts ( ∼ 10 4 photons/frame), and o v er lap-free single-shot reconstruction with an e xperimental probe reaches amplitude structural similarity (SSIM) 0.904, compared with 0.968 f or o ver lap-constrained reconstruction. Ag ainst a data-saturated supervised model with the same backbone (16,384 training images), Ptyc hoPINN achie v es higher SSIM with only 1,024 imag es and g eneralizes to unseen illumination proﬁles. Per -graphics processing unit (GPU) throughput is approximatel y 40 × that of least-sq uares maximum-likelihood (LSQ-ML) recons tr uction at matched 128 × 128 resolution. These results, validated on e xper imental data from the A dvanced Photon Source and the Linac Coherent Light Source, unify single-e xposure Fresnel CDI and ov erlapped ptyc hography within one framew ork, supporting dose-eﬃcient, high-throughput imaging at moder n light sources. 1. Introduction Modern light sources, suc h as f our th-generation synchrotrons and X -ray Free-Electron Lasers (XFELs), generate coherent diﬀraction data f ar fas ter than imag es can be reconstructed [1]. This g ro wing g ap between acquisition and analy sis precludes real-time f eedback and on-the-ﬂy e xperimental steering, both essential f or maximizing the scientiﬁc output of these facilities. Pty chographic coherent diﬀraction imaging (CDI) is a cornerstone x-ra y nanoscale imaging technique [2], but the computational reconstruction of real-space imag es from diﬀraction f aces some limitations and tradeoﬀs. Firs t, classical iterative algor ithms lik e the Ptyc hographic Iterativ e Engine (PIE) require ∼ 60–70% scan o v er lap f or robus t con ver gence and process onl y ∼ 0.1–1 diﬀraction patterns per second on standard hardw are [3, 4]; e ven graphics processing unit (GPU)-accelerated solv ers struggle to k eep pace with high-repetition-rate sources [5, 6]. Supervised machine lear ning (ML) approac hes ha ve been introduced to accelerate reconstruc- tion by mo ving from iterativ e optimization-based procedures to single-shot inf erence using trained models. These approac hes can accelerate inf erence but are often limited by poor g eneralization and the need f or large labeled training sets generated by iterative solv ers. [6, 7] Moreo v er , single- frame supervised methods cannot e xploit ov erlap redundancy , failing outr ight when o v er lap constraints are required. In short, neither conv entional methods nor direct supervised in v ersion uniﬁes speed, resolution, and ﬂexible handling of real-space cons traints. Be y ond super vised direct in v ersion techniques, recent dev elopments in machine learning-based phase retriev al f or ptyc hography include h ybrid ph y sics-learning approaches (e.g., deep-prior regularization and learned accelerators within iterativ e solv ers) [8, 9], implicit neural representa- tions including sinusoidal representation netw orks (SIREN)-sty le parameter izations [10], learned probe-position cor rection f or larg e scan errors [11], and unrolled transf or mer -based pty chograph y netw orks [12]. R elated unsupervised ph ysics-a ware in version has also been demonstrated f or 3D Bragg CDI (A utoPhaseNN) [13]. Within ptyc hograph y and extended-sample CDI, no single prior approach has jointl y demonstrated reusable pre-trained inference, label-free training, and operation without strict o ver lap constraints. In this conte xt, w e address se veral limitations of pr ior approaches with a phy sics-constrained, self-supervised framew ork: a trainable inv erse-mapping netw ork is composed with a diﬀerentiable f orward simulator of coherent scatter ing, and the full system is optimized end-to-end as an autoencoder using diﬀraction-domain losses (Poisson photon-counting lik elihood [14, 15]). A ke y proper ty of this f or mulation is that real-space redundancy is treated as a conﬁgurable parameter rather than a hard requirement. Speciﬁcally , the number of simultaneousl y recons tructed coherent scattering shots can be dialed to match the acquisition regime, including the ov erlap-free setting. In practice, when a curv ed or def ocused probe pro vides suﬃcient phase div ersity , the diﬀraction-domain lik elihood alone can anc hor reconstruction and spatial redundancy can be reduced to zero. This is the pr inciple underl ying Fresnel CDI [16, 17]. W e use “single-shot” throughout in the limited sense of a single diﬀraction measurement with a str uctured probe (without lateral scanning, beam multiplexing [18, 19], or downs tream modulators [20, 21]). Our previous w ork [22] demonstrated this phy sics-constrained approac h on synthetic data; here, we extend it to realistic probes, arbitrar y scan geometries, and single-shot recons truction. W e ev aluate the model under both typical and non-ideal conditions, including lo w photon dose and large position jitter , and demonstrate good perf or mance on e xperimental data from the A dvanced Photon Source (APS) and the Linac Coherent Light Source (LCLS). Speciﬁcall y , this w ork demonstrates: 1. self-supervised recons tr uction of e xper imental data (APS, LCLS) at ∼ 6 . 1 × 10 3 diﬀraction patterns/s; 2. o v er lap-free, single-shot reconstruction in Fresnel CDI g eometr y; 3. dose-eﬃcient imaging via P oisson likelihood at ∼ 10 4 photons/frame; 4. an order -of-magnitude impro vement in data eﬃciency ov er a super vised baseline with the same network architecture. In this study all reconstructions are performed in o v erlap-free single-shot mode, e x cept in e xplicitl y labeled o v erlap ablations. 2. Methods and Arc hitecture 2.1. F or mulation and F orward Model W e lear n an inv erse map 𝐺 : 𝑋 → 𝑌 from diﬀraction space to real space and optimize it b y composing with a diﬀerentiable f or ward model 𝐹 : 𝑌 → 𝑋 . The o verall autoencoder is 𝐹 ◦ 𝐺 , trained to match measured diﬀraction statistics without ground-truth images. Data model and notation. Eac h training sample comprises 𝐶 𝑔 diﬀraction amplitude images { 𝑥 𝑘 } 𝐶 𝑔 𝑘 = 1 acquired at probe coordinates { ® 𝑟 𝑘 } 𝐶 𝑔 𝑘 = 1 . The netw ork 𝐺 ( 𝑥 , 𝑟 ) outputs 𝐶 𝑔 comple x object patches { 𝑂 𝑘 } 𝐶 𝑔 𝑘 = 1 on an 𝑁 × 𝑁 g r id. W e use: • T Δ ® 𝑟 [ ·] : real-space translation by Δ ® 𝑟 , • Pad [ ·] : zero-padding to a canv as lar g e enough to contain all translated patches, • Pad 𝑁 / 4 [ ·] : zero-padding that embeds a central 𝑁 / 2 × 𝑁 / 2 tile into an 𝑁 × 𝑁 gr id, (a) Idealized — CDI (b) Idealized — Ptyc ho (c) Semi-synthetic — CDI (d) Semi-synthetic — Ptyc ho Fig. 1. Reconstruction comparison across probe types and acquisition modes. Ro ws: idealized probe (Gaussian-smoothed disk, uniform phase) vs semi-synthetic (experi- mental probe, synthetic object). Columns: single-shot CDI v s o v erlapped ptychograph y . • Crop 𝑁 [ ·] : center -cropping to 𝑁 × 𝑁 , • 1 : an all-ones ar ray of appropriate size, • ⊙ : elementwise (Hadamard) product. Constraint map ( 𝐹 𝑐 ): translation-aw are merging. T o enf orce o ver lap consistency , per -patch reconstructions are merg ed in a translation-aligned frame: 𝑂 region ( ® 𝑟 ) = Í 𝐶 𝑔 𝑘 = 1 T − ® 𝑟 𝑘 [ Pad ( 𝑂 𝑘 ) ] Í 𝐶 𝑔 𝑘 = 1 T − ® 𝑟 𝑘 [ Pad ( 1 ) ] + 𝜖 , 𝜖 = 10 − 3 . (1) This "translational pooling” applies to arbitrary scan geometries. Coordinate-a war e grouping. T raining groups are f ormed locally by neares t-neighbor sampling. For each anchor ® 𝑟 𝑖 , let N 𝐾 ( ® 𝑟 𝑖 ) be its 𝐾 nearest distinct neighbors. A group G 𝑖 , 𝑗 dra ws 𝐶 𝑔 − 1 neighbors uniformly without replacement: G 𝑖 , 𝑗 = { ® 𝑟 𝑖 } ∪ 𝑆 𝑖 , 𝑗 , 𝑆 𝑖 , 𝑗 ⊂ N 𝐾 ( ® 𝑟 𝑖 ) , | 𝑆 𝑖 , 𝑗 | = 𝐶 𝑔 − 1 , repeated 𝑛 samples times per anc hor . If duplicate neighbor sets are disallo w ed, the eﬀective number of distinct groups per anchor is 𝑛 eﬀ = min  𝑛 samples ,  𝐾 𝐶 𝑔 − 1   , so the total number of training examples is 𝑁 scan × 𝑛 eﬀ , with the combinator ial upper bound 𝑁 scan  𝐾 𝐶 𝑔 − 1  . Choosing 𝑛 samples > 1 augments the dataset through combinator ial re-g rouping while preser ving local spatial consistency . Coordinates within each group are expressed in a stable local frame by re-centering to the group centroid ® 𝑟 global = 1 𝐶 𝑔 𝐶 𝑔  𝑘 = 1 ® 𝑟 𝑘 , ® 𝑟 rel 𝑘 = ® 𝑟 𝑘 − ® 𝑟 global . Diﬀraction map ( 𝐹 𝑑 ): coherent scattering. Giv en 𝑂 region , the 𝑘 th translated object patch and e xit wa v e are 𝑂 ′ 𝑘 ( ® 𝑟 ) = Crop 𝑁 h T ® 𝑟 rel 𝑘  𝑂 region  i , (2) Ψ 𝑘 = F  𝑂 ′ 𝑘 ( ® 𝑟 ) · 𝑃 ( ® 𝑟 )  , (3) where 𝑃 ( ® 𝑟 ) is the (estimated) probe and F is the 2D Fourier transf or m. Predicted detector -plane amplitudes include a global intensity scale 𝑒 𝛼 log that links normalized network outputs to ph y sical photon counts: ˆ 𝐴 𝑘 = | Ψ 𝑘 | 𝑒 𝛼 log . (4) 2.2. Data Preprocessing A dataset consists of diﬀraction images from one or more objects measured with a ﬁx ed probe illumination 𝑃 . After grouping imag es into samples of 𝐶 𝑔 diﬀraction patterns eac h (Section 2.1), w e normalize the raw diﬀraction amplitudes to ensure f a v orable neural net activ ation magnitudes during training: 𝑥 𝑘 = 𝑥 ′ 𝑘 ·  ( 𝑁 / 2 ) 2  Í 𝑖 , 𝑗 | 𝑥 ′ 𝑖 𝑗 | 2  , (5) where 𝑥 ′ denotes raw measurements and the av erage is ov er all imag es in the dataset. This c hoice ensures order -unity activations in the neural network: b y Parse val’ s theorem, unit-amplitude real-space objects produce diﬀraction pow er of appro ximately 𝑁 2 / 4 , so this normalization maps e xperimental amplitude images to internal activations of order unity . A dditionally , w e introduce a trainable scalar 𝛼 log that conv erts between the dimensionless internal model activations and absolute per -pixel integrated amplitudes. The ﬁnal, scaled, netw ork input is 𝑥 in = 𝑥 · 𝑒 − 𝛼 log . 2.3. Neural Network Architecture The inv erse map 𝐺 f ollo ws an encoder–decoder design (as in [22]; see also [23] f or a PyT orch implementation and nov el training procedures), conditioned on { 𝑥 𝑘 } 𝐶 𝑔 𝑘 = 1 and { ® 𝑟 rel 𝑘 } 𝐶 𝑔 𝑘 = 1 , and outputs comple x patches { 𝑂 𝑘 } 𝐶 𝑔 𝑘 = 1 . T o respect ov ersampling while a v oiding probe tr uncation artifacts, the decoder allocates most capacity to the central, w ell-posed region and a lightw eight continuation to the periphery . Handling extended probes. Con volutional neural network (CNN) arc hitectures are limited to modest dimensions ( 𝑁 ≤ 128 ) because con v olutional receptiv e ﬁelds capture long-range interactions only ineﬃciently in this F ourier in v ersion setting, and we must further more res trict high-resolution reconstruction to the central 𝑁 / 2 × 𝑁 / 2 region to satisfy ov ersampling conditions [24]. Probes with e xtended tails f orce ineﬃcient use of this limited number of pixels because the real-space area br ightly illuminated b y the probe is small compared to the total probe area that mus t be represented to av oid truncation ar tifacts from non-zero amplitude at the edge of the real-space g r id. Consequentl y , given the modes t magnitude of 𝑁 , fully inscribing the probe—tails in- cluded—within the central 𝑁 / 2 × 𝑁 / 2 pix els may require too much binning. This causes a dilemma: one must choose between truncation ar tifacts (and possible lack of conv erg ence due to the associated phy sical inconsistency) and violation of the diﬀraction-space ov ersampling condition for coherent imaging. W e resolv e this b y reconstructing the object in high resolution in the central 𝑁 / 2 × 𝑁 / 2 region of the real-space gr id and low resolution in the per iphery . Presuming the absence of high spatial frequency components in the probe tail, extending the probe times object reconstruction into the peripher y does not compromise well-posedness of the inv erse problem. Concretely , w e split the penultimate decoder la y er’ s c hannels into a ma jority set f or the central region and the remaining 4 channels to coarsely reconstr uct the per iphery: 𝑂 amp = Pad 𝑁 / 4  𝜎 𝐴 ( Conv ( 𝐻 central 𝐴 ) )  + 𝜎 𝐴 ( ConvU p ( 𝐻 border 𝐴 ) ) ⊙ 𝑀 border , (6) 𝑂 phase = Pad 𝑁 / 4  𝜋 tanh ( Con v ( 𝐻 central 𝜙 ) )  + 𝜋 tanh ( Con vUp ( 𝐻 border 𝜙 ) ) ⊙ 𝑀 border , (7) 𝑂 𝑘 = 𝑂 amp · e xp  𝑖 𝑂 phase  , (8) where 𝐻 central { · } targ ets the central region, 𝐻 border { · } (the last 4 channels) produces a low -resolution continuation, and 𝑀 border is a binary mask that isolates the boundar y contr ibutions to the outer region. This modiﬁcation av oids ar tifacts from truncation of the exit wa v e and enables stable reconstruction with experimentall y-realis tic probes. 2.4. T r aining Objective and Optimization P oisson negativ e log-lik elihood (NLL). The training procedure optimizes the in v erse map 𝐺 using a negativ e log-likelihood loss under Poisson statistics: L Poiss = −  𝑘 , 𝑖 , 𝑗 log 𝑓 Poiss ( 𝑁 𝑘 𝑖 𝑗 ; 𝜆 𝑘 𝑖 𝑗 ) =  𝑘 , 𝑖 , 𝑗  𝜆 𝑘 𝑖 𝑗 − 𝑁 𝑘 𝑖 𝑗 log 𝜆 𝑘 𝑖 𝑗  , (9) where 𝑁 𝑘 𝑖 𝑗 = | 𝑥 ′ 𝑘 𝑖 𝑗 | 2 is the measured photon count and 𝜆 𝑘 𝑖 𝑗 = | ˆ 𝐴 𝑘 𝑖 𝑗 | 2 is the predicted count. Since the network operates on normalized inputs (Eq. 5) f or numer ical stability , a scale parameter 𝑒 𝛼 log bridges nor malized and phy sical units. When the mean photon ﬂux 𝑁 photons is kno wn, we initialize: 𝑒 𝛼 log ← 2  𝑁 photons 𝑁 . (10) This ensures predicted intensities match measurement statis tics. The parameter 𝑒 𝛼 log ma y be ﬁx ed or lear ned (see T able 3); learning it can absorb modes t calibration er rors. Amplitude loss f or unkno wn counts. For datasets lacking absolute photon counts, w e resor t to mean absolute error (MAE) on normalized amplitudes: L MAE =  𝑘 , 𝑖 , 𝑗   𝑥 𝑘 𝑖 𝑗 − ˆ 𝐴 𝑘 𝑖 𝑗 𝑒 − 𝛼 log   . In the results reported here we do not use an y real-space loss; training is dr iven solely b y the diﬀraction-domain losses (Poisson NLL or MAE). Implementation notes. All operators in 𝐹 𝑐 and 𝐹 𝑑 are diﬀerentiable and implemented with padding-a ware translations and f ast Fourier transf or m (FFT)-based diﬀraction. Batching is performed ov er groups G 𝑖 , 𝑗 ; nearest-neighbor sampling with 𝑛 samples > 1 pro vides dataset augmentation while preser ving local spatial consistency . Default architectural and training h yper parameters are summarized in T able 3. 2.5. Super vised Baseline The supervised baseline uses the same encoder -decoder backbone and input representation as Pty choPINN (cf. Pty choNN [7]). It is trained with direct real-space super vision on paired diﬀraction/ref erence-reconstruction data, without enf orcing the diﬀerentiable f or ward model in the training loss. Data splits, nor malization, and scan-coordinate conditioning are matched to the Pty choPINN r uns so the comparison isolates training paradigm rather than architecture. 2.6. Datasets and Evaluation Protocol W e ev aluate on an APS V elociprobe Siemens-s tar dataset, an LCLS X -ra y Pump-Probe (XPP) test patter n dataset (hereafter , LCLS XPP dataset), a synthetic Siemens-s tar dataset simulated from APS Siemens-star reconstructions (ground tr uth f or T able 1), and a synthetic line-patter n dataset of randomly oriented high-aspect-ratio f eatures from [22] (used for the o ver lap ablation in T able 2). APS and LCLS e xper iments are r un in single-shot mode (one diﬀraction frame per group), e xcept where o v er lap ablations are e xplicitl y labeled. For the Siemens-star e xper iments, w e use a spatial holdout: the top half of the scan is used f or training and the bottom half for testing. For out-of-distribution transf er , models trained on APS data are ev aluated on LCLS data without retraining, with beamline-speciﬁc f or ward parameters (probe/geometry) substituted at inf erence. 3. Results W e report results on the APS V elociprobe Siemens-star data, the LCLS XPP dataset, the synthetic Siemens-star dataset, and the synthetic line-patter n dataset; see Methods for dataset deﬁnitions and evaluation protocol. 3.1. Reconstruction Quality Figure 2 compares reconstructions on the APS Siemens-star data across two sampling budg ets (512 and 8192 diﬀraction patterns), using a spatial holdout where the top half of the scan is used f or training and the bottom half f or testing. At 8192 patter ns (Fig. 2b), the supervised baseline reconstructs training-region data w ell but degrades on held-out positions, whereas Ptyc hoPINN maintains consistent quality across both. At 512 patterns, this train–test g ap widens further f or the supervised baseline. On the synthetic Siemens-star dataset (simulated from APS Siemens-star reconstructions), Pty choPINN also attains higher phase ﬁdelity than the super vised baseline; see T able 1. T able 1. Recons truction q uality metr ics at maximum training set size (16,384 images): peak signal-to-noise ratio (PSNR) and structural similar ity index measure (SSIM). V alues shown are mean ± standard de viation across 5 trials. Best v alues per dataset are highlighted in green . PSNR (dB) SSIM Dataset Method Amplitude Phase Amplitude Phase synthetic Siemens-star Supervised baseline 84 . 83 ± 0 . 23 68 . 62 ± 0 . 02 0 . 930 ± 0 . 002 0 . 912 ± 0 . 003 Pty choPINN 85 . 53 ± 0 . 02 70 . 54 ± 0 . 06 0 . 955 ± 0 . 001 0 . 962 ± 0 . 001 3.2. Overlap-F ree Reconstr uction In o ver lap-free operation, w e set the group size to a single diﬀraction frame ( 𝐶 𝑔 = 1 ), remo ving o v erlap-based real-space consistency . Reconstruction then relies entirely on the diﬀraction likelihood and the kno wn probe structure (def ocused probe/Fresnel g eometr y). Figure 1 illustrates this single-frame mode compared with multi-position ptyc hography . Quantitativ e comparisons across o ver lap and probe-structur ing v ariants on a synthetic line-pattern dataset are summarized in T able 2 (o ver lap-free 𝐶 𝑔 = 1 vs ov erlap 𝐶 𝑔 = 4 ). With an e xper imental probe, remo ving ov erlap reduces amplitude SSIM by 0.064 (0.968 to 0.904) and PSNR by 4.14 dB (73.03 to 68.89). T able 2. Synthetic line-patter n amplitude reconstruction metrics on the test split. Ground tr uth is the simulated object (amplitude only; the object has constant zero phase). Case PSNR (dB) SSIM o v er lap-free ( 𝐶 𝑔 = 1 ) + idealized probe 60.67 0.620 o v er lap-free ( 𝐶 𝑔 = 1 ) + experimental probe 68.89 0.904 o v er lap ( 𝐶 𝑔 = 4 ) + idealized probe 71.34 0.952 o v er lap ( 𝐶 𝑔 = 4 ) + experimental probe 73.03 0.968 3.3. Photon-Limited P erformance Figure 3 compares resolution using the 50% Fourier r ing cor relation cr iter ion (FR C50) as a function of photon dose f or Poisson NLL v ersus MAE training objectives. A t lo w dose ( ∼ 10 4 photons/frame), the Poisson NLL achie v es comparable resolution to MAE at roughly 10 × higher dose, cor responding to an order-of-magnitude improv ement in dose eﬃciency . This advantag e ar ises because the P oisson likelihood cor rectly models photon-counting noise, preserving sensitivity to the low -count, high- 𝑞 components that car ry ﬁne spatial detail but are o v erwhelmed b y br ight-pixel residuals under an amplitude MAE. (a) 512 diﬀraction patterns of the Siemens star test pattern. (b) 8192 diﬀraction patterns of the Siemens star test pattern. Fig. 2. Compar ison of reconstruction quality with diﬀerent numbers of diﬀraction patterns. 3.4. Data Efﬁciency Figure 4 illustrates the reconstruction quality (phase SSIM) as a function of dataset size. Pty choPINN maintains high ﬁdelity (SSIM > 0 . 85 ) from as f e w as 1024 diﬀraction patterns. In contrast, the super vised baseline deg rades rapidly below 2048 samples. At small training set sizes, Pty choPINN achie v es comparable quality using roughly an order of magnitude less training data. This sugg ests that the ph ysical constraints enf orced by the training procedure act as an eﬀective pr ior f or this inv erse-imaging task. 3.5. Out-of-distribution Generalization Figure 5 compares an in-distribution LCLS control (train LCLS XPP , test LCLS XPP) with an out-of-distribution transf er setting (train APS, test LCLS XPP). Under APS → LCLS shift, the supervised baseline lar gel y collapses, whereas Ptyc hoPINN preser ves edge structure, albeit with visible phase artifacts. The reference column is an e xtended pty chographic iterative engine (ePIE) reconstruction of the LCLS data. 3.6. Computational P erformance Pty choPINN processes appro ximately 6.1k diﬀraction patter ns per second at 64 × 64 image resolution and 2.6k patterns per second at 128 × 128 in single-GPU inf erence measurements, MAE objective P oisson NLL objectiv e 10 9 photons 10 4 photons (a) Reconstruction compar ison at 10 9 and 10 4 photons for MAE versus Poisson NLL objectives (left: representative diﬀraction patter ns). (b) Resolution (FRC50) as a function of on-sample photon dose. Fig. 3. Photon-limited per f ormance for two self-super vised PtychoPINN variants trained with mean absolute er ror (MAE) and P oisson negativ e log likelihood (NLL) reconstruction penalties. Fig. 4. Structural similarity of Ptyc hoPINN and the supervised baseline as a function of training set size. e x cluding stitching/reassembl y time. As a high-per f ormance con v entional baseline, w e bench- marked LSQ-ML with pty-chi [25] at 128 × 128 (batch size 96) and measured 1.444 s per epoch o v er 10,304 frames. Assuming 100 iterations f or con v erg ence, this cor responds to 71.36 frames/s  10 , 304 / ( 100 × 1 . 444 )  . At matched 128 × 128 resolution, Pty choPINN therefore pro vides an appro ximatel y 40 × throughput adv antag e o ver LSQ-ML. 4. Discussion Overlap-free reconstruction T able 2 rev eals a clear interaction between probe structure and o v erlap. With the idealized (ﬂat-phase) probe, removing o v erlap ( 𝐶 𝑔 = 1 ) drops amplitude SSIM from 0.952 to 0.620; with the e xper imental (curved) probe, the same chang e yields 0.968 to 0.904. Probe curvature larg ely compensates for the loss of o v erlap-based redundancy , consistent with the e xpected role of structured phase diversity in Fresnel CDI. These trends indicate that o ver lap and probe diversity are par tially substitutable sources of cons traint, but their interaction warrants further s tudy across a broader range of probe geometries. Making o v er lap a tunable parameter rather than a hard requirement has concrete implications f or e xperimental design. Scans can use fe w er positions, less o v erlap, or —in the Fresnel regime—no scanning at all, reducing acquisition time and total dose. The frame w ork is also more tolerant of position jitter than o v erlap-dependent methods, since the reconstr uction does not rely on precise inter -frame registration to enf orce real-space consistency . T ogether , these proper ties are par ticularly rele vant f or dynamic or radiation-sensitive samples at high-rate sources, where o v er lap requirements, scan precision, and photon budget are simultaneous constraints. Diffraction-space supervision The f or ward model pro vides dense ph y sical cons traint per measurement: each diﬀraction patter n encodes the full e xit-wa ve amplitude, and the Poisson NLL cor rectly weights ev ery detector pix el—including the low -count, high- 𝑞 pix els where ﬁne spatial detail resides (Fig. 3). By contrast, real-space supervision constrains the netw ork ag ainst a single reference reconstruction that already car r ies the ambiguities intrinsic to the inv erse problem, suc h as global phase oﬀsets. Out-of-Distribution (T rain A → T est B) In-Distribution (T rain B → T est B) Refer ence (ePIE) Ptyc hoPINN Supervised A APS-2-ID; B LCLS XPP; A APS-2-ID; B LCLS XPP; Fig. 5. Comparison of methods f or an in-distribution LCLS control (train LCLS XPP , test LCLS XPP) and out-of-distr ibution transfer (train APS, test LCLS XPP). The ref erence column sho ws an ePIE reconstruction of the LCLS data. Because these nuisance parameters are not uniquely deter mined by the data, a super vised netw ork can o v er ﬁt to them, which lik ely explains both the supervised baseline’ s train–test gap on held-out scan positions (Fig. 2) and its collapse under cross-facility transfer (Fig. 5). Data eﬃciency f ollo ws from the same mechanism: the f orward-model constraint is f ar more informativ e per sample than a pixel-wise real-space loss, so the netw ork con v erg es with roughly an order of magnitude fe w er training patterns (Fig. 4). Open problems The main methodological limitation is the ﬁx ed-probe assumption: the cur rent f ormulation uses a pre-estimated probe and ﬁxed scan coordinates during training, so it does not correct probe dr ift or position er rors. A direct extension is to jointly reﬁne probe and position parameters within the same self-super vised loop. The frame w ork is modular: inv erse backbone, diﬀerentiable f orward model, and loss are separable components. This design should allow further speedups from mixed precision and architecture-le v el optimization without changing the architecture or training procedure. At higher resolution, the dominant scaling bottlenec k is the CNN inv erse bac kbone. Replacing it with a Fourier neural operator (FN O) backbone is a likel y next step, because global spectral mixing is expected to scale better with imag e size 𝑁 and impro ve high-resolution reconstruction quality . The same modular structure should also simplify adaptation to other coherent imaging geometries, including Bragg CDI. 5. Conclusions W e presented an extended Ptyc hoPINN framew ork that uniﬁes o v erlap-free single-shot Fresnel coherent diﬀraction imaging and ov erlapped ptyc hography within a single self-supervised f ormulation. The method combines a diﬀerentiable coherent-scatter ing f or w ard model with diﬀraction-domain training losses and supports arbitrary scan geometries through coordinate- a ware grouping. A cross APS and LCLS e xperiments, we measured appro ximatel y 6 . 1 × 10 3 diﬀraction patterns/s at 64 × 64 and 2 . 6 × 10 3 at 128 × 128 in single-GPU inference. In o v er lap ablations on synthetic line-pattern data with an e xper imental probe, ov erlap-free reconstr uction reached amplitude SSIM 0.904 v ersus 0.968 for o v er lap-constrained reconstruction. In photon- limited regimes, Poisson NLL training impro v ed dose eﬃciency by roughly an order of magnitude relativ e to MAE at comparable FR C50. Relativ e to a supervised baseline with the same backbone, the method maintained high quality with substantiall y fe wer training samples. Future work will f ocus on joint probe/position reﬁnement and higher -capacity inv erse-mapping neural network backbones f or large-imag e reconstructions. Appendix A: Ke y Conﬁguration P arameters These parameters control critical aspects of the reconstruction process and should be tuned based on experimental conditions and computational constraints. T able 3. Model parameters, default code values, and settings used for the APS/LCLS e xperiments in this paper Parame ter Default Description N 64 Patc h dimension (pix els) C_g 1 Patterns per group (code default: 4) K 4 Neares t neighbors f or scan-position grouping pad_object T r ue Res trict object to 𝑁 / 2 × 𝑁 / 2 f or o versampling probe.mask F alse Apply circular mask to probe gaussian_smoothing_sigma 0.0 Gaussian smoothing 𝜎 applied to probe illumination intensity_scale.trainable T r ue Whether 𝛼 log is optimized during training n_filters_scale 2 Netw ork width multiplier amp_activation sigmoid Amplitude decoder activation offset 4 Scan step size (pixels) 𝑑 3-5 Encoder depth (resolution-dependent) T able 4. Symbol deﬁnitions Symbol T ype / Structure Description 𝑥 ′ Set of 𝐶 𝑔 real images Ra w diﬀraction patter ns for one sample 𝑥 Set of 𝐶 𝑔 real images Normalized diﬀraction patterns f or one sample ® 𝑟 𝑘 2D Position V ector Absolute scan position f or the 𝑘 -th imag e within a sample ® 𝑟 global 2D Position V ector Centroid of a solution region (group of scans) ® 𝑟 rel 𝑘 2D Oﬀset V ector Relativ e scan oﬀset within a solution region 𝑒 𝛼 log Scalar (trainable or ﬁx ed) Log-intensity scale parameter 𝑁 photons Scalar T arg et a v erag e total photons per diﬀraction pattern 𝑃 ( ® 𝑟 ) 𝑁 × 𝑁 comple x array Eﬀectiv e probe function 𝑂 𝑘 𝑁 × 𝑁 comple x array 𝑘 -th object patc h decoded b y the netw ork 𝐺 𝑂 region 𝑀 × 𝑀 comple x array Merg ed object representation for a solution region 𝑂 ′ k 𝑁 × 𝑁 comple x array Object patch e xtracted from 𝑂 region f or forw ard model Ψ 𝑘 𝑁 × 𝑁 comple x array Predicted complex wa v eﬁeld at the detector ˆ 𝐴 𝑘 𝑁 × 𝑁 real array Predicted ﬁnal diﬀraction amplitude f or one patch 𝜆 𝑖 𝑗 𝑘 Scalar Poisson rate parameter f or a single pix el 𝑁 : patch dimension, 𝐶 𝑔 : patches per group, 𝑀 : merged region size Funding. This w ork was supported by the U .S. Department of Energy , Laborator y Directed Research and Dev elopment program at SLA C N ational Accelerator Laborator y , under Contract N o. DE-A C02-76SF00515. Disclosures. The authors declare no conﬂicts of interest. Data av ailability . Data and code supporting this study are a vailable from the cor responding author upon reasonable request. The Pty choPINN source code is a vailable at https://github.com/ hoidn/Ptyc hoPINN . References 1. SLA C National Accelerator Laboratory, “LCLS-II-HE: Design and Perf ormance, ” https:// lcls.slac.stanf ord.edu/ lcls- ii- he/ design- and- per formance (2023). Accessed: 2025-08-14. 2. M. Guizar-Sicairos and P . Thibault, “Pty chograph y: A solution to the phase problem,” Ph ys. T oda y 74 , 42–48 (2021). 3. O. Bunk, M. Dierolf, S. K ynde, et al. , “Inﬂuence of the ov erlap parameter on the conv ergence of the pty chographical iterativ e engine, ” Ultramicroscopy 108 , 481–487 (2008). 4. A. M. Maiden and J. M. R odenburg, “ An improv ed ptychographical phase retrieval algorithm for diﬀractiv e imaging, ” Ultramicroscopy 109 , 1256–1262 (2009). 5. S. Marchesini, H. Krishnan, B. J. Daurer, et al. , “Sharp: a distr ibuted gpu-based pty chographic solv er, ” J. Appl. Crystallogr. 49 , 1245–1252 (2016). 6. A. V . Babu, T . Zhou, S. Kandel, e t al. , “Deep lear ning at the edge enables real-time streaming ptychographic imaging, ” Nat. Commun. 14 , 7059 (2023). 7. M. J. Cherukara, T . Zhou, Y . S. G. N ashed, et al. , “ Ai-enabled high-resolution scanning coherent diﬀraction imaging, ” Appl. Phy s. Lett. 117 , 044103 (2020). 8. C. A. Metzler, P . Schniter , A. V eeraragha van, and R. G. Baraniuk, “prdeep: Robus t phase retr iev al with a ﬂe xible deep network, ” in Proceedings of the 35th International Confer ence on Machine Learning, vol. 80 of Proceedings of Machine Learning Researc h (2018), pp. 3501–3510. 9. A. R. C. McCray , S. M. Ribet, G. V arnavides, and C. Ophus, “ A ccelerating iterative ptyc hography with an integrated neural network, ” J. Microsc. 300 , 180–190 (2025). 10. V . Sitzmann, J. N . P . Mar tel, A. W . Bergman, e t al. , “Implicit neural representations with per iodic activ ation functions, ” in Advances in Neural Inf ormation Pr ocessing Systems, v ol. 33 (2020), pp. 7462–7473. 11. M. Du, T . Zhou, J. Deng, et al. , “Predicting ptyc hography probe positions using single-shot phase retrieval neural netw ork, ” Opt. Express 32 , 36757–36780 (2024). 12. W . Gan, Q. Zhai, M. T . McCann, et al. , “Pty chodv: Vision transformer -based deep unrolling network f or ptychographic image reconstruction, ” IEEE Open J. Signal Process. 5 , 539–547 (2024). 13. Y . Y ao, H. Chan, S. K. R. S. Sankaranaray anan, et al. , “ Autophasenn: unsuper vised phy sics-aw are deep lear ning of 3d nanoscale bragg coherent diﬀraction imaging, ” npj Comput. Mater. 8 , 124 (2022). 14. P . Thibault and M. Guizar-Sicairos, “Maximum-likelihood reﬁnement f or coherent diﬀractiv e imaging, ” New J. Ph y s. 14 , 063004 (2012). 15. J. P . Seifert, Z. Chen, M.-J. Y oon, et al. , “Maximum-likelihood ptyc hography in the presence of poisson–gaussian noise, ” Opt. Lett. 48 , 4897–4900 (2023). 16. G. J. Williams, H. M. Quine y , B. B. Dhal, et al. , “Fresnel coherent diﬀractive imaging, ” Phy s. Rev. Lett. 97 , 025506 (2006). 17. M. Stockmar , P . Cloetens, I. Zanette, et al. , “Near -ﬁeld ptyc hography : phase retriev al for inline holography using a structured illumination, ” Sci. R epor ts 3 , 1927 (2013). 18. P . Sidorenk o and O. Cohen, “Single-shot pty chograph y , ” Optica 3 , 9–14 (2016). 19. K. Khar itonov , M. Mehr joo, M. Ruiz-Lopez, et al. , “Single-shot pty chography at a soft x-ray free-electron laser, ” Sci. Reports 12 , 14430 (2022). 20. F . Zhang, I. P eterson, J. Vila-Comamala, et al. , “Phase retriev al by coherent modulation imaging, ” Nat. Commun. 7 , 13367 (2016). 21. X. Dong, X. Pan, C. Liu, and J. Zhu, “Single shot multi-w a velength phase retrieval with coherent modulation imaging, ” Opt. Lett. 43 , 1762–1765 (2018). 22. O. Hoidn, A. A. Mishra, and A. Mehta, “Ph ysics constrained unsupervised deep learning for rapid, high resolution scanning coherent diﬀraction reconstruction, ” Sci. Reports 13 , 22789 (2023). 23. A. V ong, S. Henk e, O. Hoidn, et al. , “T ow ards generalizable deep pty chograph y neural netw orks, ” arXiv abs/2509.25104 , 1–1 (2025). 24. J. Miao, P . Charalambous, J. Kirz, and D. Sayre, “Extending the methodology of x-ray crystallograph y to allo w imaging of micrometre-sized non-crystalline specimens, ” Nature 400 , 342–344 (1999). 25. M. Du, H. R uth, S. Henke, et al. , “Pty -chi: A pytorch-based modern ptyc hographic data analy sis packag e, ” arXiv abs/2510.20929 , 1–1 (2025).

Towards single-shot coherent imaging via overlap-free ptychography

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment