Learning Permutation Distributions via Reflected Diffusion on Ranks

Learning P ermutation Distrib utions via Reﬂected Diffusion on Ranks Sizhuang He * 1 Y angtian Zhang * 1 Shiyang Zhang 1 David v an Dijk 1 Abstract The ﬁnite symmetric group S n provides a natural domain for permutations, yet learning probability distributions on S n is challenging due to its fac- torially growing size and discrete, non-Euclidean structure. Recent permutation dif fusion methods deﬁne forward noising via shuf ﬂe-based random walks (e.g., rif ﬂe shufﬂes) and learn re verse tran- sitions with Plackett–Luce (PL) v ariants, but the resulting trajectories can be abrupt and increas- ingly hard to denoise as n grows. W e propose Soft-Rank Diffusion , a discrete diffusion frame- work that replaces shuf ﬂe-based corruption with a structured soft-rank forward process: we lift permutations to a continuous latent representation of order by relaxing discrete ranks into soft ranks, yielding smoother and more tractable trajectories. For the re verse process, we introduce contextual- ized generalized Plac kett–Luce (cGPL) denoisers that generalize prior PL-style parameterizations and improv e expressivity for sequential decision structures. Experiments on sorting and combinato- rial optimization benchmarks show that Soft-Rank Dif f usion consistently outperforms prior dif fusion baselines, with particularly strong gains in long- sequence and intrinsically sequential settings. 1. Introduction Permutations are fundamental objects in a wide range of applications, from ranking search results and recommenda- tions ( Feng et al. , 2021 ; Koren et al. , 2009 ) to sorting and learning-to-rank ( Liu , 2009 ) and combinatorial optimiza- tion problems such as the trav eling salesperson problem (TSP). In man y settings, we aim to model a distrib ution o ver permutations—and ultimately build generati ve models with which to sample permutations. In continuous Euclidean domains, dif fusion models ( Sohl- Dickstein et al. , 2015 ; Ho et al. , 2020 ; Song et al. , 2021 ; * Equal contribution 1 Department of Computer Science, Y ale Univ ersity , New Ha ven, CT , USA. Correspondence to: David v an Dijk < david.v andijk@yale.edu > . Pr eprint. Marc h 19, 2026. 2022 ) hav e emerged as po werful and scalable tools for mod- eling complex distributions like images, and related ideas hav e been extended be yond R d to discrete objects such as text and graphs, a class of models kno wn as Discrete Diffu- sion models ( Austin et al. , 2023 ; Lou et al. , 2024 ; Dieleman et al. , 2022 ; Gat et al. , 2024 ; Gulrajani and Hashimoto , 2023 ; V ignac et al. , 2023 ). Y et permutations remain a par- ticularly challenging case: although they are discrete, the state space gro ws f actorially with sequence length n , and the natural transitions on permutations, like card-shuf ﬂing, are often abrupt —small local moves can cause discontinuous, non-differentiable changes in the induced ordering. Con- sequently , diffusion-style constructions that w ork well for other discrete domains can become brittle on permutations and scale poorly with n . Dev eloping discrete diffusion models over permutations remains relati vely underexplored. Recent progress on per - mutation dif fusion ( Zhang et al. , 2025 ) tackles above chal- lenge by deﬁning the forward noising process directly on permutations—typically as a random walk induced by rifﬂe shufﬂes ( Gilbert , 1955 )—and parameterizing the rev erse dy- namics with the Plackett–Luce distribution (PL) ( Plack ett , 1975 ; Luce , 1959 ) and its generalizations (GPL) ( Zhang et al. , 2025 ). While effecti ve on small instances, these dis- crete forward trajectories can be highly “jumpy”: ev en a single rifﬂe–shuf ﬂe can simultaneously move man y items across the sequence, yielding abrupt, non-smooth changes in the ordering. Performance often de grades rapidly and can ev en collapse in the long- n regime. In this paper , we take a dif ferent perspectiv e: rather than diffusing within S n , we lift a permutation—viewed as the rank of each element—to a continuous latent representation and diffuse in that space. Concretely , we represent an order- ing by assigning each item a continuous-valued soft rank z ∈ [0 , 1] , obtaining a soft rank vector Z ∈ [0 , 1] n and re- cov er the induced permutation by sorting, σ = argsort( Z ) . This relaxation lets us deﬁne smooth stochastic dynamics in a continuous space, while still producing a discrete permu- tation at any time via a simple sorting operation. Building on this idea, we introduce Soft-Rank Diffusion , which deﬁnes the forward noising process as a reﬂected diffusion bridge ( Lou and Ermon , 2023 ; Xie et al. , 2024 ) on [0 , 1] n , and observes permutations through the induced 1 Learning P ermutation Distributions via Reﬂected Diffusion on Ranks ordering σ t = argsort( Z t ) . The latent construction also yields a re verse-time sampler . W e augment the intractable discrete rev erse step with an auxiliary , tractable continuous update in the soft-rank space, followed by a projection back to permutations via sorting. W e further introduce contextu- alized GPL (cGPL) and a pointer -cGPL variant, gener al- izations of PL/GPL that make stagewise logits depend on the ev olving preﬁx and the shrinking candidate set—a natu- ral ﬁt for intrinsically sequential and dynamic tasks such as TSP . W e ev aluate Soft-Rank Dif fusion on standard permutation generation benchmarks, including 4-digit MNIST sorting and TSP . Across settings, our method consistently outper- forms prior permutation dif fusion baselines and dif feren- tiable sorting baselines, with gains that widen as sequence lengths increase. T aken together, our results suggest that reﬂected dif fusion in soft-rank space provides a principled and scalable route to permutation generativ e modeling. Contributions. Our main contributions are: • W e introduce Soft-Rank Diffusion , a permutation diffusion framework induced by reﬂected diffusion bridges in a relaxed continuous soft-rank space. • W e deriv e a hybrid rev erse sampler that augments the intractable re verse dynamics in permutation space with tractable updates in the soft-rank space. • W e propose cGPL and pointer -cGPL , generalizations of PL/GPL that impro ve preﬁx-conditional e xpressivity for sequential permutation tasks. • W e demonstrate strong empirical performance on long- sequence MNIST sorting and TSP benchmarks, partic- ularly in the large- n regime. 2. Related W orks Discrete Diffusion Models. Diffusion models ( Sohl- Dickstein et al. , 2015 ; Ho et al. , 2020 ; Song et al. , 2021 ) were initially de veloped for continuous data. D3PM ( Austin et al. , 2023 ) extended this framew ork to discrete domains by deﬁning forward noising processes based on masking or uniform replacement. Subsequent work introduced alterna- tiv e parameterizations and training objectiv es for discrete diffusion ( Lou et al. , 2024 ; Gat et al. , 2024 ) and further extended discrete (both in state space and time) diffusion to continuous-time variants ( Campbell et al. , 2022 ; Sun et al. , 2023 ; Shi et al. , 2025 ). Ho wev er, most existing methods assumes a manageable or factorized state space; directly extending them to permutation-v alued data is challenging because the group S n has size n ! , making nai ve transi- tion representations intractable without exploiting additional structure. Learning Permutations. Sorting algorithms can be viewed as producing a permutation of a set of items and are well understood; ho wever , their hard discrete decisions are non-differ entiable , which hinders end-to-end training when permutations are produced by , or embedded within, neural models. This moti vates dif ferentiable sorting methods that replace discrete sw aps or permutation operators with smooth surrogates, yielding a soft permutation (or rank) representa- tion amenable to gradient-based optimization ( Mena et al. , 2018 ; Cuturi et al. , 2019 ; Prillo and Eisenschlos , 2020 ; Blon- del et al. , 2020 ; Grov er et al. , 2019 ; Petersen et al. , 2022 ). Another widely used family is P ointer Networks (Ptr-Nets) ( V inyals et al. , 2017 ), which parameterize permutation- valued outputs by decoding a sequence of indices into the input, rather than symbols from a ﬁxed vocab ulary . At each decoding step, an attention mechanism induces a categori- cal distribution o ver input positions, and the chosen inde x speciﬁes which input element is appended next in the output ordering. Because the ef fectiv e output domain scales with input length, Ptr-Nets naturally accommodate v ariable-sized instances and provide a natural generic parameterization for combinatorial prediction problems whose outputs are orderings (e.g., sorting and routing). Discrete Diffusion on P ermutations. Zhang et al. ( 2025 ) introduce SymmetricDiffusers , which formulates discrete diffusion directly on the ﬁnite symmetric group S n . The forward noising dynamics are instantiated as a random walk on S n , with the rif ﬂe shufﬂe ( Gilbert , 1955 ) serving as an ef- fecti ve transition that mix es rapidly to enable short diffusion chains in practice. For the rev erse (denoising) parameterization, Plackett–Luce- style (PL) models are adopted and extended to a generalized Plackett–Luce (GPL) family . While the PL distribution is deﬁned as p PL ( σ ) = n Y i =1 exp  s σ ( i )  P n j = i exp  s σ ( j )  , (1) i.e., it assigns a single preference score s k to each item k and samples items without r eplacement according to these ﬁxed scores, the GPL family generalizes this construction by assigning step-speciﬁc scores for each item rather than a single ﬁxed global score. As a result, GPL is strictly more expressi ve than PL; in particular , GPL is uni versal in the sense that it can represent arbitrary distributions o ver S n . Reﬂected Diffusion and Reﬂected Flow Matching. Dif- fusion models are often trained on data supported on bounded domains (e.g., [0 , 255] for unnormalized image pixels), yet the learned dynamics can leav e the domain at intermediate time steps—a pathology that is commonly han- dled by ad-hoc clipping. Lou and Ermon ( 2023 ) address this 2 Learning P ermutation Distributions via Reﬂected Diffusion on Ranks issue by formulating generation as a r eﬂected score-based SDE, augmenting the dynamics with an additional reﬂection term that enforces the state constraint and keeps trajecto- ries in-bounds throughout sampling. Xie et al. ( 2024 ) later extend the same principle to ﬂo w matching, incorporating reﬂection into the corresponding ﬂow/ODE formulations. 3. Methods Let [ N ] : = { 1 , . . . , N } . A permutation is a bijection σ : [ N ] → [ N ] . W e can write it as σ =  1 2 · · · N σ (1) σ (2) · · · σ ( N )  , meaning that each element i ∈ [ N ] is mapped to σ ( i ) . W e can make the source indices (1 , 2 , . . . , N ) implicit and use the one-line notation σ = ( σ (1) , σ (2) , . . . , σ ( N )) . Under this con vention, σ ( i ) can be interpreted as the (desti- nation) position of element i after reordering, i.e., the rank of element i of the original sequence in the permuted order- ing. W e denote by S N the set of all such permutations (i.e., the symmetric group under composition). Applying a permu- tation to an ordered list simply reindex es its entries. Equi v- alently , each σ ∈ S N can be represented by a permutation matrix P σ ∈ { 0 , 1 } N × N , so that the same reindexing can be implemented via matrix multiplication. Giv en an instance X = ( x 1 , . . . , x N ) ∈ R N × d consisting of N items (with associated features), we aim to model a conditional distrib ution ov er permutations σ ∈ S N . W e observe i.i.d. training pairs ( X, σ 0 ) ∼ p ⋆ , and our goal is to learn a generativ e model that can efﬁciently sample valid permutations conditioned on X . W e write X t = σ t ( X ) to be the permuted instance at time t . W e adopt a diffusion-style approach on S N . Speciﬁcally , we deﬁne a forward noising process on permutations that gradually transforms σ 0 tow ard a tar get distrib ution, and train a parameterized re verse-time model p θ ( σ t − ∆ t | σ t , X ) to approximately inv ert this corruption process. At test time, sampling starts from the reference distrib ution and iterativ ely applies the learned rev erse dynamics to generate ˆ σ 0 ∈ S N conditioned on X . 3.1. Forward Pr ocess: Soft-Rank Diffusion Instead of deﬁning dif fusion dynamics directly on the dis- crete space S N (e.g., via card shuf ﬂing methods as in Zhang et al. ( 2025 ), which can be abrupt and unstructured), we adopt a continuous latent viewpoint in which a permutation is encoded by the relativ e ordering of real-v alued coordi- nates. The key idea is to relax the discrete rank values in [ N ] to continuous soft ranks in [0 , 1] , and to deﬁne a diffusion process over these continuous v ariables. At any intermediate time, the induced ordering of the latent coordinates yields a valid permutation of the original items. This relaxation brings two immediate beneﬁts. First, it provides a Euclidean state space where noise injection and interpolation are natural, enabling diffusion-style modeling without relying on discrete, jump-like transitions. Second, the forward marginals can be designed to be analytically tractable, which later allows us to derive a reverse-time sampler . Intuiti vely , one may vie w each item as a particle under going Brownian motion: starting from an initial ordered conﬁgu- ration, the particles gradually dif fuse, and the permutation at time t is giv en by the snapshot ordering of their positions. Formally , to initialize a continuous latent state, we con vert a permutation σ ∈ S N into a canonical soft-rank vector in [0 , 1] N by mapping discrete ranks to a ﬁxed grid. Let { g r } N r =1 be a uniform grid on [0 , 1] , e.g., g r = r − 1 N − 1 ( r = 1 , . . . , N ) . (2) W e deﬁne the grid-mapping operator LiftT oGrid : S N → [0 , 1] N by LiftT oGrid( σ ) i : = g σ ( i ) , i ∈ [ N ] . (3) In particular , we set Z 0 : = LiftT oGrid( σ 0 ) . W e generate Z t by coupling the data latent Z 0 to a ran- domly sampled endpoint Z 1 ∼ p ref via a Bro wnian bridge, where p ref is a tractable, data-independent reference dis- tribution (e.g., N (0 , I N ) or Unif ([0 , 1] N ) ). W e apply this construction coor dinate-wise : each entry of Z t ev olves as an independent one-dimensional bridge driv en by its own Brownian motion. For clarity , we describe a single coordi- nate using lowercase notation. Speciﬁcally , we consider the follo wing SDE, which corresponds to a VE diffusion bridge as in Zhou et al. ( 2023 ): dz t = z 1 − z t 1 − t dt + η dw t + dl t , (4) where w t is a standard one-dimensional Brownian motion, η is the noise scale and l t is a reﬂection term ( Lou and Ermon , 2023 ; Xie et al. , 2024 ) that enforces the state constraint z t ∈ [0 , 1] . Then, at an y time t , we obtain a discrete permutation in the rank representation simply by ranking the coordinates of Z t . Assuming distinct coordinates (almost surely under continuous noise) and using 1 -based indexing, we recover σ t as σ t = argsort  argsort( Z t )  . Figure 1 illustrates the forward process. 3 Learning P ermutation Distributions via Reﬂected Diffusion on Ranks 𝑋 ! 𝑋 " 𝑋 # 𝑋 " " Z ! 𝑝 " (𝜎 # %|𝑋 $ ) Z % 𝑍 # * =[ & ' , 0 , 1 , ( ' ] 𝑋 # * = 𝜎 # %(𝑋 $ ) Discrete space Continuo us space For wa rd Proc es s Reverse Process ! ! Soft Ranks a t time t " " Permuted data sequence at time t # # Denoiser model parameterized with $ … … Induced forward sh uffling process in permutatio n space Augmenting reverse sampling step wit h continuo us Soft Ranks Induced reverse step in permutation spac e #%& $ '& " ( & % ) (*& & + Induce orderin g from Soft Ranks Lift to continuous Soft Ranks Reverse transition kerne l F igure 1. Soft-Rank Diffusion. W e deﬁne the forward diffusion by relaxing each item’ s rank to a continuous soft-rank variable and ev olving these soft ranks in a reﬂected diffusion process ( Lou and Ermon , 2023 ). At each time t , the soft ranks induce a discrete ordering by simple sorting, thereby yielding a forward process in permutation space. For reverse sampling, we couple a discrete denoiser in permutation space (predicting a clean permutation from X t ) with an auxiliary continuous update: we lift the predicted permutation to a grid-aligned soft-rank vector ˆ Z 0 , sample an intermediate latent Z s for s < t from a conditional re verse kernel p ( Z s | Z t , ˆ Z 0 , Z 1 ) , and map back to permutation space by sorting Z s , thus stepping backward in time. 3.2. Reverse Pr ocess Although the forward process is deﬁned in a continuous latent space, our observations and learning targets are the induced permutations. In other words, the model operates on discrete permutation-valued states, while the continuous latent dynamics serve as an underlying construction that we in voke only to deriv e a con venient rev erse-time update. One can choose to treat the continuous latent Z t as the primary re verse-time state and learn a denoiser directly in R N . W e instead work in S N : during sampling we keep the permutation σ t as the explicit state, and reconstruct a rank-consistent latent Z t only as an auxiliary variable when ex ecuting each backward update. W e adopt a σ 0 -prediction style parameterization. Speciﬁ- cally , a neural network f θ predicts an estimate of the initial permutation, ˆ σ 0 = f θ ( X t , t ) , (5) which in turn induces an estimate of the clean sequence via the permutation action, ˆ X 0 = ˆ σ 0 ( X ) . W e describe the ov erall reverse process ne xt, and later pro- vide details on the parameterization of f θ . Reverse-time transition kernel. Our forward process in Eq. 4 is a reﬂected Brownian bridge. Following Xie et al. ( 2024 ), in practice one ﬁrst solve an unconstrained back- ward step in R N using the corresponding unr eﬂected bridge conditional, and apply the reﬂection operator only after- wards (and only when needed). Speciﬁcally , let µ ( u ) : = (1 − u ) z 0 + uz 1 denote the linear interpolation between endpoints. F or the unconstrained bridge, for any 0 < s < t < 1 we hav e the closed-form Gaussian conditional z s | z t , z 0 , z 1 ∼ N  µ ( s ) + s t  z t − µ ( t )  , η 2 s ( t − s ) t  . (6) Equiv alently , we sample a proposal ˜ z s by ˜ z s = µ ( s )+ s t  z t − µ ( t )  + η r s ( t − s ) t ϵ, ϵ ∼ N (0 , 1) . (7) and then enforce the latent-domain constraint via reﬂection: z s ← R ( ˜ z s ) . (8) W e defer the deriv ation of Eq. ( 6 ) to Appendix B . The reverse sampling procedure is initialized by drawing z 1 ∼ p ref . Since our model maintains only a permutation- val u ed state, we obtain the corresponding initial permutation 4 Learning P ermutation Distributions via Reﬂected Diffusion on Ranks Algorithm 1 Rev erse Sampling via Reﬂected Gaussian- Bridge Updates Require: Number of items n ; time grid 0 = t 0 < t 1 < · · · < t K = 1 ; diffusion scale η . Reference distribution p ref on [0 , 1] n . Score network f θ deﬁning p θ ( σ | X t , t ) . Operators LiftT oGrid( · ) and Reﬂect( · ) . 1: Sample z 1 ∼ p ref and set z t K ← z 1 . 2: for k = K , K − 1 , . . . , 1 do 3: t ← t k , s ← t k − 1 . 4: X t ← argsort( z t ) (induce discrete state) 5: Sample ˆ σ 0 ∼ p θ ( · | X t , t ) (via f θ ) 6: ˆ z 0 ← LiftT oGrid( ˆ σ 0 ) ∈ [0 , 1] n . 7: µ t ← (1 − t ) ˆ z 0 + tz 1 , µ s ← (1 − s ) ˆ z 0 + sz 1 . 8: Sample ξ ∼ N (0 , I n ) . 9: ¯ z s ← µ s + s t  z t − µ t  + η q s ( t − s ) t ξ . 10: z s ← Reﬂect( ¯ z s ) . 11: z t ← z s . 12: end for 13: return z t 0 and optionally ˆ σ ← argsort( z t 0 ) . by mapping the latent to the grid via σ 1 = LiftT oGrid( z 1 ) in Eq. ( 3 ) . The complete re verse-time algorithm is summa- rized in Algorithm 1 and illustrated in Figure 1 . 3.3. Model architectur e W e now describe the model architecture used to parameter - ize the σ 0 -prediction distribution and to sample an estimate of the clean permutation, as in Eq. 5 . W e extend the Generalized Plack ett–Luce (GPL) parame- terization of ( Zhang et al. , 2025 ) to a contextualized Gen- eralized Plackett–Luce (cGPL) model. In contrast to the encoder-only architecture used in Zhang et al. ( 2025 ), we implement cGPL with a full encoder–decoder T ransformer ( V aswani et al. , 2023 ). This formulation generalizes GPL, and we sho w empirically that cGPL yields a more e xpressive parameterization across our benchmarks. Contextualized GPL. Similar to GPL ( Zhang et al. , 2025 ), cGPL models a permutation σ ∈ S n with a stagewise distribution that, at each position i , normal- izes scores ov er the remaining (unselected) set R i ( σ ) := [ n ] \ { σ (1) , . . . , σ ( i − 1) } : p θ ( σ | X , t ) = n Y i =1 exp  s σ ( i ) ,i  P k ∈R i ( σ ) exp  s k,i  . (9) The distinction between GPL and cGPL lies entirely in the context each position- i score vector s : ,i is allo wed to depend on. In GPL, scores are preﬁx-agnostic and can be computed once as a static matrix. Consequently , sampling from GPL consists of a single forward pass to obtain all scores, followed by sequentially reading the corresponding columns and sampling without replacement. GPL: s k,i : = [ f θ ( X, t )] k,i , (10) In contrast, cGPL makes scores e xplicitly preﬁx-conditional through an encoder–decoder T ransformer . As a result, sam- pling from cGPL must proceed autoregressiv ely: scores are recomputed progressiv ely as the preﬁx σ Embeddings (b) cGPL sampling. F igure 2. Sampling in PL/GPL/cGPL. (a) PL and GPL. In PL, each item is assigned a single scalar score; sampling a permuta- tion amounts to repeatedly sampling from the same score v ector without replacement, masking selected items and renormalizing at each step. In GPL, each item is assigned a length- N score vector , yielding position-speciﬁc logits: at step i we sample according to the i -th column of the score matrix (after masking previously selected items), proceeding sequentially from i = 1 to N . (b) cGPL. In cGPL, each item is assigned a position-dependent score vector dynamically . Sampling proceeds autor e gressively : the score vector at later positions depends on the outcomes sampled at pre- ceding positions. As in GPL, we apply masking and subsequent renormalization to obtain a v alid probability distribution ov er the remaining items. Note that ˜ q t in the discrete permutation space is generally intractable but as discussed in 3.2 , we can augment with a tractable update step in the continuous soft-rank space. W ith the feasibility mask in Eq 13 , the negati ve log- likelihood reduces to a sum of cross-entropies on the masked scores: L ( σ ; X, t ) = − n X i =1 log exp  ˜ s σ ( i ) ,i  P n k =1 exp  ˜ s k,i  . (15) Pointer parameterization. As an alternati ve to a ﬁxed n -way linear head that directly outputs n logits (treating output coordinates as a ﬁxed “vocab ulary” index ed by abso- lute positions), we draw inspiration from Pointer Netw orks ( V inyals et al. , 2017 ) and parameterize the cGPL scores with an item-aligned pointer head that explicitly points to the encoded input items. Concretely , at decoding step i we compute a compatibility score between the decoder state and each encoder item rep- resentation, and interpret the resulting length- n vector as a categorical distribution o ver the input indices (rather than ov er a ﬁxed output dictionary); see Fig. 4 for an illustration. This parameterization naturally supports v ariable-size out- put dictionaries and provides another way of producing Algorithm 2 Autoregressi ve Sampling from cGPL Require: Input set X = { x 1 , . . . , x n } ; diffusion step t ; neural decoder f θ Ensure: Permutation ˆ σ ∈ S n 1: R ← { 1 , 2 , . . . , n } (remaining items) 2: ˆ σ ← [ ] (empty preﬁx) 3: for i = 1 , . . . , n do 4: Compute logits s : ,i ←  [ f θ ( X, ˆ σ 100 . By contrast, Soft-Rank Diffusion de grades more gracefully , indicating improv ed scalability and robustness to longer permutations. This scaling advantage is further illustrated in Figure 3 , which examines e xact-match accuracy as a function of the sequence length N . Figure 3a shows that both variants of Soft-Rank Diffusion are substantially less prone to catas- trophic accuracy collapse as N increases, with the Pointer variant e xhibiting consistently stronger robustness at larger sequence lengths. Figure 3b further quantiﬁes this effect by reporting the relative e xact-match accuracy of Soft-Rank Diffusion over SymmetricDif fusers, computed as the ratio relativ e to the baseline accuracy . The improvement mar gin grows rapidly with N , indicating that Soft-Rank Dif fusion achiev es increasingly fav orable scaling behavior compared to SymmetricDiffusers. 4.2. T raveling Salesperson Pr oblem (TSP) W e ev aluate Soft-Rank Dif fusion on the traveling salesper- son problem (TSP), an NP-hard combinatorial optimization task. Gi ven a set of 2D points V = { v 1 , . . . , v n } ⊂ R 2 , the goal is to predict a tour permutation σ ∈ S n that minimizes the total length P n i =1 ∥ v σ ( i ) − v σ ( i +1) ∥ 2 , where σ ( n + 1) := σ (1) . W e report the tour length and the opti- mality gap (see Appendix A.2 for details) on TSP-20 and TSP-50 ( n = 20 and n = 50 , respectively). W e rerun SymmetricDiffusers using their respecti ve ofﬁcial GitHub repositories and default hyper-parameters. F or Symmet- ricDiffusers, we ag ain use GPL sampling and disable beam search to ensure a fair comparison. See Appendix A.2 This task is particularly well-suited to highlight the adv an- tage of our dynamic cGPL parameterization: unlike static scoring schemes, TSP decoding requires making preﬁx- dependent decisions o ver a shrinking set of remaining cities, where the desirability of each candidate depends strongly on the current partial tour . By producing conte xt-conditioned, step-speciﬁc logits over the remaining items, cGPL better matches this dynamic decision structure and therefore ex- hibits larger performance gains as problem size increases. T able 2 reports Euclidean TSP results on TSP-20 and TSP- 50. Across both problem sizes, Soft-Rank Diffusion sub- stantially improv es over SymmetricDif fusers, reducing tour length and closing the optimality gap by more than two orders of magnitude. Moreov er, the Pointer variant consis- tently achiev es the best performance, yielding further gains ov er the ﬁxed v ariant—especially on TSP-50—highlighting the beneﬁt of our dynamic cGPL parameterization for preﬁx- dependent decoding ov er a shrinking candidate set. 5. Ablation Studies W e study the impact of se veral ke y design choices of Soft- Rank Diffusion in T able 4 . While our default conﬁguration adopts an σ 0 parametrization for the cGPL rev erse model, we observe that the model achie ves consistently strong per- formance when using an σ t − 1 parametrization, as also em- ployed in Zhang et al. ( 2025 ). In contrast, when paired with the Rifﬂe Shuf ﬂe forward process, the σ 0 parametriza- tion consistently fails, leading to a severe degradation in performance across all e valuation metrics. This may be due to the intractability of the reverse transition when the forward process is deﬁned via a rif ﬂe shufﬂe. Moreover , ev en under the shared σ t − 1 parametrization, models trained with the Soft-Rank Dif fusion forward process consistently outperform those using the Rif ﬂe Shufﬂe forw ard process. Finally , incorporating a biafﬁne pointer mechanism in the cGPL rev erse model further improv es performance ov er the vanilla cGPL v ariant, highlighting the beneﬁt of explicitly modeling autoregressi ve selection in the re verse dynamics. 6. Conclusions W e presented Soft-Rank Dif fusion, a diffusion frame work for permutation-valued data that bridges dif fusion modeling with principled ranking distributions. By lifting permuta- 7 Learning P ermutation Distributions via Reﬂected Diffusion on Ranks T able 1. MNIST sorting benchmark. W e compare baselines and our models across sequence lengths. Bold indicates the best value in each column. For a controlled comparison, SymmetricDiffusers and both Soft-Rank Diffusion v ariants use the same 7-layer T ransformer backbone (ours: 3-layer encoder + 4-layer decoder with matched hidden width). This differs from Zhang et al. ( 2025 ), which used a 12-layer T ransformer for n = 200 and a 7-layer Transformer for other lengths. Method Metric Sequence Length N 9 15 32 52 75 100 150 200 DiffSort ( Petersen et al. , 2022 ) Kendall-T au ↑ 0.8163 0.7386 0.4423 0.3037 0.2227 0.1617 0.0891 0.0636 Accuracy ↑ 0.5555 0.2350 0.0003 0.0000 0.0000 0.0000 0.0000 0.0000 Correctness ↑ 0.8643 0.7975 0.5335 0.3897 0.2987 0.2251 0.1307 0.0953 Error-free Dif fSort ( Kim et al. , 2024 ) Kendall-T au ↑ 0.9151 0.9160 0.8798 0.0042 0.0005 0.0029 0.0006 0.0002 Accuracy ↑ 0.7953 0.7492 0.5186 0.0000 0.0000 0.0000 0.0000 0.0000 Correctness ↑ 0.9365 0.9334 0.9019 0.0258 0.0132 0.0112 0.0072 0.0030 SymmetricDiffusers with GPL ( Zhang et al. , 2025 ) Kendall-T au ↑ 0.9483 0.9320 0.8610 0.7890 0.7304 0.1114 0.0641 0.0022 Accuracy ↑ 0.8955 0.8260 0.5896 0.3009 0.1379 0.0000 0.0000 0.0000 Correctness ↑ 0.9605 0.9450 0.8846 0.8232 0.7733 0.1620 0.0975 0.0061 Soft-Rank Diffusion with cGPL Ours Kendall-T au ↑ 0.9668 0.9384 0.8765 0.8092 0.6925 0.6279 0.4821 0.3982 Accuracy ↑ 0.9397 0.8601 0.6555 0.4239 0.1908 0.0863 0.0080 0.0004 Correctness ↑ 0.9742 0.9505 0.8967 0.8393 0.7347 0.6773 0.5468 0.4644 Soft-Rank Diffusion with Pointer-cGPL Ours Kendall-T au ↑ 0.9645 0.9455 0.8944 0.8459 0.7727 0.7454 0.6198 0.6048 Accuracy ↑ 0.9329 0.8710 0.6854 0.4909 0.2742 0.1922 0.0421 0.0137 Correctness ↑ 0.9719 0.9559 0.9119 0.8724 0.8082 0.7849 0.6704 0.6607 T able 2. TSP performance on TSP-20 and TSP-50. W e report tour length L ( T our ) and the optimality gap (both lower is better). Method TSP-20 TSP-50 L ( T our ) ↓ Gap ↓ L ( T our ) ↓ Gap ↓ SymmetricDiffusers with GPL ( Zhang et al. , 2025 ) 5.4342 0.4169 12.8525 1.2618 Soft-Rank Diffusion with cGPL Ours 3.8732 0.0079 5.7336 0.0078 Soft-Rank Diffusion with Pointer-cGPL Ours 3.8557 0.0034 5.7157 0.0047 tions into a continuous latent space through soft rank rep- resentations, our approach enables a smoother and more tractable forward process than existing abrupt schemes such as rifﬂe–shuf ﬂes, and a theoretically sound reverse solver grounded in Plackett–Luce–style likelihoods. Central to our method is the proposed conte xtualized Generalized Plack- ett–Luce (cGPL) parameterization, which conditions each selection step on the e volving preﬁx and generalizes prior preﬁx-agnostic models. Across sorting and combinatorial optimization benchmarks, Soft-Rank Diffusion consistently outperforms e xisting per- mutation diffusion and differentiable sorting approaches, with particularly strong gains on long and intrinsically se- quential permutations. These results highlight the impor- tance of context-aw are reverse dynamics for scalable per- mutation modeling. W e believ e this framework opens the door to further unifying dif fusion-based generativ e mod- eling with structured, likelihood-based formulations over discrete combinatorial objects. Impact Statement This paper adv ances methods in machine learning. W e do not anticipate societal impacts that require speciﬁc discus- sion beyond standard considerations. References Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel T ar- low , and Rianne van den Berg. Structured denoising diffusion models in discrete state-spaces, 2023. URL https://arxiv.org/abs/2107.03006 . Mathieu Blondel, Oli vier T eboul, Quentin Berthet, and Josip Djolonga. Fast dif ferentiable sorting and ranking, 2020. URL . Andrew Campbell, Joe Benton, V alentin De Bortoli, T om Rainforth, George Deligiannidis, and Arnaud Doucet. A continuous time frame work for discrete denoising mod- els, 2022. URL 14987 . Marco Cuturi, Olivier T eboul, and Jean-Philippe V ert. Differentiable ranks and sorting using optimal trans- port, 2019. URL 11885 . Sander Dieleman, Laurent Sartran, Arman Roshannai, Niko- 8 Learning P ermutation Distributions via Reﬂected Diffusion on Ranks lay Savinov , Y aroslav Ganin, Pierre H. Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan, Curtis Hawthorne, R ´ emi Leblond, Will Grath- wohl, and Jonas Adler . Continuous diffusion for cate gor- ical data, 2022. URL 2211.15089 . Y ufei Feng, Y u Gong, Fei Sun, Junfeng Ge, and W enwu Ou. Revisit recommender system in the permutation prospec- ti ve, 2021. URL 12057 . Itai Gat, T al Remez, Neta Shaul, Felix Kreuk, Ricky T . Q. Chen, Gabriel Synnae ve, Y ossi Adi, and Y aron Lipman. Discrete ﬂow matching, 2024. URL https://arxiv. org/abs/2407.15595 . Edgar Gilbert. Theory of shufﬂing. T echnical report, T ech- nical memorandum, Bell Laboratories, 1955. Aditya Grover , Eric W ang, Aaron Zweig, and Stefano Er- mon. Stochastic optimization of sorting networks via continuous relaxations, 2019. URL https://arxiv. org/abs/1903.08850 . Ishaan Gulrajani and T atsunori B. Hashimoto. Likelihood- based diffusion language models, 2023. URL https: //arxiv.org/abs/2305.18619 . Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models, 2020. URL https:// arxiv.org/abs/2006.11239 . Chaitanya K. Joshi, Quentin Cappart, Louis-Martin Rousseau, and Thomas Laurent. Learning tsp requires rethinking generalization. volume 210, pages 33:1–33:21. Schloss Dagstuhl – Leibniz-Zentrum f ¨ ur Informatik, 2021. doi: 10.4230/LIPICS.CP .2021.33. URL https://drops.dagstuhl.de/entities/ document/10.4230/LIPIcs.CP.2021.33 . M. G. Kendall. A new measure of rank correlation. Biometrika , 30(1-2):81–93, 06 1938. ISSN 0006-3444. doi: 10.1093/biomet/30.1- 2.81. URL https://doi. org/10.1093/biomet/30.1- 2.81 . Jungtaek Kim, Jeongbeen Y oon, and Minsu Cho. General- ized neural sorting netw orks with error -free differentiable swap functions, 2024. URL abs/2310.07174 . Y ehuda K oren, Robert Bell, and Chris V olinsky . Matrix factorization techniques for recommender systems. Com- puter , 42(8):30–37, 2009. doi: 10.1109/MC.2009.263. T ie-Y an Liu. Learning to rank for information retriev al. F ound. T rends Inf. Retr . , 3(3):225–331, March 2009. ISSN 1554-0669. doi: 10.1561/1500000016. URL https://doi.org/10.1561/1500000016 . Aaron Lou and Stefano Ermon. Reﬂected diffusion mod- els, 2023. URL 04740 . Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete dif f usion modeling by estimating the ratios of the data dis- tribution, 2024. URL 2310.16834 . R. D. Luce. Individual Choice Behavior . John W iley , 1959. T orchV ision maintainers and contributors. T orchvision: Py- torch’ s computer vision library . https://github. com/pytorch/vision , 2016. Gonzalo Mena, David Belanger, Scott Linderman, and Jasper Snoek. Learning latent permutations with gumbel- sinkhorn networks, 2018. URL https://arxiv. org/abs/1802.08665 . Felix Petersen, Christian Bor gelt, Hilde Kuehne, and Oliv er Deussen. Monotonic dif ferentiable sorting net- works, 2022. URL 2203.09630 . R. L. Plackett. The analysis of permutations. Journal of the Royal Statistical Society: Series C (Applied Statis- tics) , 24(2):193–202, 1975. doi: https://doi.org/10.2307/ 2346567. URL https://rss.onlinelibrary. wiley.com/doi/abs/10.2307/2346567 . Sebastian Prillo and Julian Martin Eisenschlos. Softsort: A continuous relaxation for the ar gsort operator , 2020. URL https://arxiv.org/abs/2006.16038 . Jiaxin Shi, Kehang Han, Zhe W ang, Arnaud Doucet, and Michalis K. T itsias. Simpliﬁed and generalized masked diffusion for discrete data, 2025. URL https:// arxiv.org/abs/2406.04329 . Jascha Sohl-Dickstein, Eric A. W eiss, Niru Mah- eswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics, 2015. URL . Jiaming Song, Chenlin Meng, and Stefano Ermon. De- noising diffusion implicit models, 2022. URL https: //arxiv.org/abs/2010.02502 . Y ang Song, Jascha Sohl-Dickstein, Diederik P . Kingma, Abhishek Kumar , Stefano Ermon, and Ben Poole. Score- based generativ e modeling through stochastic differential equations, 2021. URL 2011.13456 . Haoran Sun, Lijun Y u, Bo Dai, Dale Schuurmans, and Hanjun Dai. Score-based continuous-time discrete dif- fusion models, 2023. URL abs/2211.16750 . 9 Learning P ermutation Distributions via Reﬂected Diffusion on Ranks Ashish V aswani, Noam Shazeer , Niki Parmar , Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2023. URL https://arxiv.org/abs/1706.03762 . Clement V ignac, Igor Krawczuk, Antoine Siraudin, Bohan W ang, V olkan Cevher , and Pascal Frossard. Digress: Discrete denoising dif fusion for graph generation, 2023. URL . Oriol V inyals, Meire Fortunato, and Navdeep Jaitly . Pointer networks, 2017. URL 1506.03134 . T ianyu Xie, Y u Zhu, Longlin Y u, T ong Y ang, Ziheng Cheng, Shiyue Zhang, Xiangyu Zhang, and Cheng Zhang. Re- ﬂected ﬂow matching, 2024. URL https://arxiv. org/abs/2405.16577 . Y ongxing Zhang, Donglin Y ang, and Renjie Liao. Symmet- ricdif f users: Learning discrete dif fusion on ﬁnite symmet- ric groups, 2025. URL 2410.02942 . Linqi Zhou, Aaron Lou, Samar Khanna, and Stefano Ermon. Denoising diffusion bridge models, 2023. URL https: //arxiv.org/abs/2309.16948 . 10 Learning P ermutation Distributions via Reﬂected Diffusion on Ranks A. Details on experimental setup A.1. 4-digit MNIST sorting Dataset Following Zhang et al. ( 2025 ), we generate the 4-digit MNIST dataset using the code from their ofﬁcial GitHub repository . For each e xperiment, we construct training sequences by sampling 60 , 000 length- N sequences of 4-digit MNIST images from torchvision.datasets.MNIST ( maintainers and contributors , 2016 ), where N is an experiment- speciﬁc parameter . W e generate an additional 10 , 000 sequences for testing. Each 4-digit image is created on the ﬂy by ﬁrst sampling four digits uniformly at random, then drawing the corresponding digit images from MNIST , and ﬁnally concatenating them into a single 4-digit image. Model architectur e For SymmetricDif fusers, we use a 7-layer encoder -only Transformer with hidden dimension 512, feedforward dimension 128, with 8 attention heads. For Soft-Rank Dif fusion, we match the overall model capacity by using a 7-layer encoder–decoder T ransformer , with 3 layers in the encoder and 4 layers in the decoder . W e keep the hidden and feedforward dimensions the same as SymmetricDiffusers. Note: While Zhang et al. ( 2025 ) use a 12-layer Transformer for N =200 , we use a 7-layer T ransformer for all N to keep model capacity consistent across sequence lengths and ensure performance is comparable across settings. T raining All models are trained on a single NVIDIA H100 GPU with batch size 64 for 120 epochs. Training data are generated on the ﬂy at each epoch. For SymmetricDif fusers, we use the default hyperparameters from the authors’ ofﬁcial GitHub repository . F or N ∈ 75 , 150 , where no default hyperparameters are provided, we reuse the settings for N = 52 and N = 100 , respectiv ely . Inference and metrics At inference time, we randomly generate 10 , 000 sequences of N 4-digit MNIST images. Since each image is labeled by its underlying 4-digit number , we can obtain the ground-truth ascending order directly from the labels. W e report the same three metrics as Zhang et al. ( 2025 ): Kendall–T au coef ﬁcient, Accuracy (exact match), and Correctness (element-wise match). Kendall–T au measures rank correlation between the predicted and ground-truth permutations; Accuracy indicates whether the two permutations are identical; and Correctness is the fraction of elements placed in the correct position. Non-diffusion model baselines W e retrain and ev aluate DiffSort ( Petersen et al. , 2022 ) and Error-free DiffSort ( Kim et al. , 2024 ) under the same dataset setup. F or DiffSort, we use the default hyperparameters from the authors’ of ﬁcial GitHub repository , with the odd–even sorting netw ork, steepness 10 , and learning rate 10 − 3 . 5 . For Error -free DiffSort, we use T ransformerL as the backbone and the hyperparameters listed in T able 3 . Sequence Length Steepness Sorting Network Loss W eight Learning Rate 9 34 odd ev en 1.00 10 − 4 15 25 odd ev en 0.10 10 − 4 32 124 odd ev en 0.10 10 − 4 52 130 bitonic 0.10 10 − 3 . 5 75 135 bitonic 0.10 10 − 3 . 5 100 140 bitonic 0.10 10 − 3 . 5 150 170 bitonic 0.10 10 − 3 . 5 200 200 bitonic 0.10 10 − 4 T able 3. Hyperparameters for Error-Free Dif fsort on 4-digit MNIST sorting A.2. TSP Dataset For TSP-20 and TSP-50, we use the dataset from Zhang et al. ( 2025 ), which is adapted from Joshi et al. ( 2021 ). The training set contains 1 , 512 , 000 graphs, and the test set contains 1 , 280 graphs. Model architectur e For SymmetricDif fusers, we use a 16-layer encoder-only T ransformer with hidden dimension 1024, feedforward dimension 256, with 8 attention heads. For Soft-Rank Dif fusion, we match the overall model capacity by using 11 Learning P ermutation Distributions via Reﬂected Diffusion on Ranks a 16-layer encoder–decoder T ransformer , with 8 layers in the encoder and 8 layers in the decoder . W e keep the hidden and feedforward dimensions the same as SymmetricDif fusers. T raining All models are trained on a single NVIDIA H100 GPU for 50 epochs with batch size 64. W e use a peak learning rate of 2 × 10 − 4 with a cosine decay schedule and 51 , 600 warm-up steps. Inference and metrics At inference time, we ev aluate on the 1 , 280 held-out test graphs and report two metrics: the tour length of the predicted solution, L pred , and the optimality gap relati ve to an OR solver baseline, L OR . The optimality gap is deﬁned as Gap = L pred − L OR L OR B. Derivations Here we derive the re verse conditional transition kernel q ( z s | z t , z 0 , z 1 ) for 0 < s < t < 1 in Eq. 6 . W e present the deriv ation for a single scalar soft-rank coordinate; the N -dimensional extension follo ws by independence across coordinates. For the (unreﬂected) Bro wnian bridge with noise scale η , the marginal at time u ∈ (0 , 1) is Gaussian: q ( z u | z 0 , z 1 ) = N  z u ; µ ( u ) , η 2 u (1 − u )  , µ ( u ) : = (1 − u ) z 0 + uz 1 . (17) A con venient representation of a Bro wnian bridge is z u = µ ( u ) + η b u , u ∈ [0 , 1] , (18) where b u is a standard Bro wnian bridge with b 0 = b 1 = 0 and covariance Co v ( b s , b t ) = min( s, t ) − st . Therefore, conditional on ( z 0 , z 1 ) , the pair ( z s , z t ) is jointly Gaussian with means E [ z s | z 0 , z 1 ] = µ ( s ) , E [ z t | z 0 , z 1 ] = µ ( t ) , (19) variances v s : = V ar( z s | z 0 , z 1 ) = η 2 s (1 − s ) , v t : = V ar( z t | z 0 , z 1 ) = η 2 t (1 − t ) , (20) and cov ariance (for s < t ) c st : = Co v ( z s , z t | z 0 , z 1 ) = η 2  min( s, t ) − st  = η 2 s (1 − t ) . (21) Hence, writing µ s : = µ ( s ) , µ t : = µ ( t ) and c : = c st , q ( z s , z t | z 0 , z 1 ) = N  z s z t  ;  µ s µ t  ,  v s c c v t  , q ( z t | z 0 , z 1 ) = N ( z t ; µ t , v t ) . (22) By Bayes’ rule, q ( z s | z t , z 0 , z 1 ) = q ( z s , z t | z 0 , z 1 ) q ( z t | z 0 , z 1 ) . (23) Since the denominator does not depend on z s (for ﬁxed z t ), it suf ﬁces to view the joint density as a function of z s and complete the square. Let x : = z s − µ s and y : = z t − µ t . Using the closed-form in verse for a symmetric 2 × 2 matrix, the in verse of the cov ariance matrix is then  v s c c v t  − 1 = 1 D  v t − c − c v s  , D : = v s v t − c 2 . (24) Therefore,  x y  ⊤  v s c c v t  − 1  x y  = 1 D  v t x 2 − 2 cxy + v s y 2  = v t D  x − c v t y  2 + const ( y ) , (25) 12 Learning P ermutation Distributions via Reﬂected Diffusion on Ranks where const ( y ) does not depend on x (hence not on z s ). This yields the Gaussian conditional z s | z t , z 0 , z 1 ∼ N  µ s + c v t ( z t − µ t ) , v s − c 2 v t  . (26) Substituting v s = η 2 s (1 − s ) , v t = η 2 t (1 − t ) , and c = η 2 s (1 − t ) (for s < t ), we obtain c v t = s t , v s − c 2 v t = η 2 s ( t − s ) t . (27) Hence, z s | z t , z 0 , z 1 ∼ N  µ ( s ) + s t  z t − µ ( t )  , η 2 s ( t − s ) t  , 0 < s < t < 1 . (28) Extension to N dimensions. Under coordinate-wise independent bridges sharing the same noise scale η , the same conditional holds with ϵ ∼ N (0 , I N ) , i.e., the conditional cov ariance becomes η 2 s ( t − s ) t I N . C. Additional ﬁgure and tables Tra ns fo rm e r E n c o d e r e 0 Tra ns fo rm e r D e c o d e r d 0 d 1 d 2 d 3 Output Head e 1 e 2 e 3 BoS y 0 y 1 y 2 y 3 (a) Model Architecture. e 0 e 1 e 2 e 3 d 0 d 1 d 2 d 3 Bi - Affine P ointer Head( e i , d j ) (b) Pointer-cGPL f actorization. F igure 4. Model architecture and P ointer-cGPL parameterization f or permutation generation. W e adopt a standard encoder–decoder T ransformer backbone ( V aswani et al. , 2023 ). In the vanilla cGPL parameterization, the decoder states are mapped to logits via a linear output head (yielding a distribution over candidates at each step). In contrast, Panel 4b illustrates P ointer-cGPL , where a bi-afﬁne compatibility module scores each encoded item representation against the current decoder state, producing step-wise logits ov er the input items that can be interpreted as a pointer distribution. 13 Learning P ermutation Distributions via Reﬂected Diffusion on Ranks T able 4. Ablation study over forw ard diffusion processes, re verse models, and parametrizations on sorting 32 4-digit MNIST images. Forward Process Rev erse Model Parametrization Kendall-T au ↑ Accuracy ↑ Correctness ↑ Rifﬂe Shuf ﬂe GPL x t − 1 0.8610 0.5896 0.8846 x 0 -0.0013 0.0000 0.0312 Soft-Rank Diffusion cGPL x t − 1 0.8764 0.6425 0.8958 cGPL w/ Biafﬁne Pointer x t − 1 0.9074 0.7152 0.9232 Rifﬂe Shuf ﬂe cGPL x t − 1 0.8527 0.5833 0.8758 cGPL w/ Biafﬁne Pointer x t − 1 0.8742 0.6404 0.8950 cGPL x 0 -0.0011 0.0000 0.0313 cGPL w/ Biafﬁne Pointer x 0 0.0001 0.0000 0.0314 14

Learning Permutation Distributions via Reflected Diffusion on Ranks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment