Chunk-Boundary Artifact in Action-Chunked Generative Policies: A Noise-Sensitive Failure Mechanism
Action chunking has become a central design choice for generative visuomotor policies, yet the execution discontinuities that arise at chunk boundaries remain poorly understood. In a frozen pretrained action-chunked policy, we identify chunk-boundary artifact as a noise-sensitive failure mechanism. First, artifact is strongly associated with task failure (p < 1e-4, permutation test) and emerges during the rollout rather than only as a post-hoc symptom. Second, under a fixed observation context, changing only latent noise systematically modulates artifact magnitude. Third, by identifying artifact-related directions in noise space and applying trajectory-level steering, we reliably alter artifact magnitude across all evaluated tasks. In hard-task settings with remaining outcome headroom, the success/failure distribution shifts accordingly; on near-ceiling tasks, positive gains are compressed by policy saturation, while the negative causal effect remains visible. Overall, we recast boundary discontinuity from an unavoidable execution nuisance into an analyzable, noise-dominated, and intervenable failure mechanism.
💡 Research Summary
This paper investigates a failure mechanism that arises at the boundaries of action chunks in generative visuomotor policies. Action‑chunked policies predict a sequence of future actions (a “chunk”) in a single forward pass, then execute only the first K steps before replanning from a new observation. Because each chunk is generated independently from a latent noise vector z and from the current observation, the transition from the tail of one chunk to the head of the next can produce abrupt changes in the action trajectory—what the authors call “boundary artifact.”
The authors first define a quantitative metric: the boundary‑interior jerk contrast (J_B – J_I). Jerk is the L2 norm of the second discrete derivative of the action sequence (‖a_t – 2a_{t‑1} + a_{t‑2}‖₂). J_B is the mean jerk over boundary phases (t mod K = 0, 1) and J_I is the mean jerk over interior phases (t mod K = 2, 3, 4). The contrast captures excess discontinuity at chunk boundaries.
Using two tasks from the LIBERO benchmark (open‑drawer and open‑drawer‑plus‑place‑bowl), the authors run 70 rollouts per task with a frozen OpenPI policy (no weight updates). They find that failed episodes exhibit a substantially larger boundary‑interior jerk contrast (+0.238 for the harder task, +0.110 for the easier one) than successful episodes. Permutation tests with 20 000 samples yield p < 10⁻⁴, confirming a strong statistical association. Importantly, the difference persists when contact‑free windows are examined and when only the first 50 steps are considered, indicating that the artifact precedes any terminal contact and is not merely a downstream symptom of failure.
To probe causality, the authors fix the observation context and vary only the latent noise z. Sampling 384 independent noise vectors across 16 contexts produces a cross‑context standard deviation of 0.040 in the jerk contrast, demonstrating that noise alone can modulate the artifact. A decomposition experiment varies the current‑chunk noise z₀, the next‑chunk noise z₁, or both. Both components contribute non‑trivially, with the relative importance of z₀ versus z₁ depending on the task (e.g., z₁ dominates for the harder LIBERO‑10 task 8, while z₀ is slightly stronger for another task). This shows that the boundary artifact is a function of the transition between two consecutive noise draws.
Having established a noise‑artifact link, the authors search for directional structure in the latent space. For each fixed context they sample 12 random directions, evaluate the induced artifact change, and select the direction d* that maximizes the absolute difference between positive and negative steering extremes. Sweeping a scalar steering coefficient α ∈ {‑1, ‑0.5, ‑0.25, 0, 0.25, 0.5, 1} along d* yields a monotonic relationship with both boundary transition jerk (Pearson r ≈ 0.90) and the jerk contrast (r ≈ 0.97). The induced artifact range (≈ 0.108) is comparable to the observed success‑failure gap, confirming that targeted steering can produce practically meaningful changes.
Finally, the authors extend steering to the full trajectory. They apply the same direction d* at every replanning step, with either a “good” (positive α) or “bad” (negative α) intervention of equal magnitude. Experiments are split into two regimes: (1) ceiling tasks where the baseline policy already achieves near‑perfect success, and (2) non‑ceiling tasks where there is headroom for improvement. In ceiling tasks, both baseline and good‑steering maintain 100 % success, but bad‑steering drops success to 0.762, while the jerk contrast ordering (good < baseline < bad) remains. In non‑ceiling tasks, good‑steering raises success from 0.674 to 0.791 and reduces the jerk contrast, whereas bad‑steering lowers success to 0.535 and increases the contrast. These results demonstrate that the boundary artifact is not merely an observable nuisance; it is an intervenable causal factor that can be attenuated or amplified via latent‑space steering, with corresponding effects on task performance when the policy is not saturated.
In summary, the paper makes three key contributions: (1) it identifies chunk‑boundary artifact as a statistically significant predictor of failure, (2) it shows that the artifact is predominantly driven by latent noise rather than observation variability, and (3) it provides a practical method—directional noise steering—to control the artifact and, in hard tasks, to improve success rates without retraining the policy. This reframes a long‑standing execution‑time issue into a tractable, noise‑dominated failure mechanism amenable to analysis and intervention.
Comments & Academic Discussion
Loading comments...
Leave a Comment