Significance of parallel computing on the performance of Digital Image Correlation algorithms in MATLAB
Digital Image Correlation (DIC) is a powerful tool used to evaluate displacements and deformations in a non-intrusive manner. By comparing two images, one of the undeformed reference state of a specimen and another of the deformed target state, the relative displacement between those two states is determined. DIC is well known and often used for post-processing analysis of in-plane displacements and deformation of specimen. Increasing the analysis speed to enable real-time DIC analysis will be beneficial and extend the field of use of this technique. Here we tested several combinations of the most common DIC methods in combination with different parallelization approaches in MATLAB and evaluated their performance to determine whether real-time analysis is possible with these methods. To reflect improvements in computing technology different hardware settings were also analysed. We found that implementation problems can reduce the efficiency of a theoretically superior algorithm such that it becomes practically slower than a sub-optimal algorithm. The Newton-Raphson algorithm in combination with a modified Particle Swarm algorithm in parallel image computation was found to be most effective. This is contrary to theory, suggesting that the inverse-compositional Gauss-Newton algorithm is superior. As expected, the Brute Force Search algorithm is the least effective method. We also found that the correct choice of parallelization tasks is crucial to achieve improvements in computing speed. A poorly chosen parallelisation approach with high parallel overhead leads to inferior performance. Finally, irrespective of the computing mode the correct choice of combinations of integer-pixel and sub-pixel search algorithms is decisive for an efficient analysis. Using currently available hardware real-time analysis at high framerates remains an aspiration.
💡 Research Summary
The paper investigates how parallel computing can accelerate Digital Image Correlation (DIC) algorithms implemented in MATLAB. DIC determines displacement fields by comparing a reference image of an undeformed specimen with a target image of the same specimen after deformation. The authors focus on two essential stages of DIC: integer‑pixel search, which provides a coarse displacement estimate, and sub‑pixel refinement, which yields sub‑pixel accuracy.
For the integer‑pixel stage they implement three methods: a brute‑force exhaustive search (BFS), a standard Particle Swarm Optimization (PSO), and a modified PSO that incorporates a gradient‑descent “star search” sub‑routine. For sub‑pixel refinement they implement Newton‑Raphson (NR) and two variants of the Inverse‑Compositional Gauss‑Newton (IC‑GN) algorithm – one written from scratch and one taken from the Baker‑Matthews publication. All methods use a zero‑normalized cross‑correlation (ZNCC) metric and limit the search window to a 25‑pixel radius, assuming small inter‑frame deformations suitable for real‑time operation.
Parallelization is explored along two orthogonal axes. “Sub‑image parallelism” distributes the processing of individual sub‑images (i.e., correlation windows) across workers, while “image‑pair parallelism” distributes whole reference‑target image pairs. MATLAB’s parallel toolbox is used for CPU parallelism via parfor and parfeval, and for GPU acceleration via the gpuArray interface that leverages NVIDIA CUDA. The authors note that GPU acceleration suffers from data‑transfer bottlenecks because each computation requires moving large image blocks across the PCI‑e bus. CPU parallelism also incurs copying overhead, as MATLAB creates a separate copy of each variable for every worker, but this overhead is generally smaller than on the GPU.
The experimental platform consists of a modular DIC code that allows any combination of the integer‑pixel and sub‑pixel algorithms, together with either parallelization strategy. Two real data sets are used: (1) 204 images of a rectangular specimen under tensile loading captured in the RMIT Materials Laboratory, and (2) 11 images of a different deformation process. Tests are performed on two hardware configurations: a modern multi‑core CPU (up to 12 cores) and an NVIDIA RTX‑3060 GPU.
Key findings:
-
Algorithmic performance – Contrary to theoretical expectations, the IC‑GN method is slower in practice than the NR method. The extra pre‑computations and higher‑level MATLAB function calls in the IC‑GN implementation introduce overhead that outweighs its reduced iteration count. The modified PSO, which limits the number of particles and generations (max 5 generations, correlation threshold 0.75), finds a near‑optimal integer‑pixel displacement quickly and, when combined with NR, yields the best overall speed‑accuracy trade‑off. BFS remains the slowest due to its exhaustive evaluation of all candidate positions.
-
Parallelization impact – Sub‑image parallelism generally outperforms image‑pair parallelism because it reduces the amount of data duplicated per worker. When the granularity of tasks is too fine, scheduling overhead dominates, degrading performance. Conversely, overly coarse tasks cause load imbalance. An optimal granularity was found to be 8–16 sub‑images per worker on the tested CPU.
-
CPU vs. GPU – On the tested GPU, the cost of transferring image data to and from device memory dominates the runtime, making CPU
parforfaster for the DIC workloads considered. The GPU shows advantage only for very large, compute‑intensive kernels, which are not typical in DIC where many small correlation windows are processed. -
Real‑time feasibility – Even with the best combination (modified PSO + NR, sub‑image parallelism on a 12‑core CPU), the achievable frame rate is below 30 fps for the tested image sizes. Achieving higher frame rates would require either algorithmic redesign (e.g., predictive deformation models to reduce search windows) or dedicated hardware accelerators such as FPGAs or ASICs.
The authors conclude that successful real‑time DIC hinges on (i) selecting lightweight integer‑pixel search algorithms (modified PSO), (ii) using a fast sub‑pixel refinement (NR), (iii) employing CPU‑based sub‑image parallelism with carefully chosen task granularity, and (iv) minimizing data copying and transfer overhead. The study provides a thorough, empirical assessment of how algorithmic choices and parallelization strategies interact in MATLAB, offering practical guidance for researchers aiming to push DIC toward real‑time operation.
Comments & Academic Discussion
Loading comments...
Leave a Comment