Differential Performance Debugging with Discriminant Regression Trees
Differential performance debugging is a technique to find performance problems. It applies in situations where the performance of a program is (unexpectedly) different for different classes of inputs. The task is to explain the differences in asymptotic performance among various input classes in terms of program internals. We propose a data-driven technique based on discriminant regression tree (DRT) learning problem where the goal is to discriminate among different classes of inputs. We propose a new algorithm for DRT learning that first clusters the data into functional clusters, capturing different asymptotic performance classes, and then invokes off-the-shelf decision tree learning algorithms to explain these clusters. We focus on linear functional clusters and adapt classical clustering algorithms (K-means and spectral) to produce them. For the K-means algorithm, we generalize the notion of the cluster centroid from a point to a linear function. We adapt spectral clustering by defining a novel kernel function to capture the notion of linear similarity between two data points. We evaluate our approach on benchmarks consisting of Java programs where we are interested in debugging performance. We show that our algorithm significantly outperforms other well-known regression tree learning algorithms in terms of running time and accuracy of classification.
💡 Research Summary
The paper addresses the problem of differential performance debugging: identifying and explaining why a program exhibits markedly different execution times (or other performance metrics) for inputs that are otherwise similar. Traditional static analysis focuses on functional correctness and does not scale to performance properties, while dynamic profiling typically examines individual traces and cannot directly compare distinct input classes. To fill this gap, the authors propose a data‑driven approach based on Discriminant Regression Trees (DRTs).
A DRT is a hybrid of a regression tree and a classification tree. Its internal nodes contain predicates over auxiliary variables (e.g., counts of specific function calls, boolean flags), while each leaf stores an affine function that models the primary performance output (such as runtime) as a linear function of the input variables (e.g., input size). Consequently, a DRT not only classifies a trace into a performance class but also provides a concise, human‑readable explanation of the class in terms of program internals.
Learning a DRT is decomposed into two stages:
-
Functional (linear) clustering of the trace data using only input and output variables. The goal is to partition the (input, output) points into K clusters, each approximated by a distinct linear function. The authors adapt two classic clustering algorithms:
- K‑means is generalized so that each cluster centroid is a linear function (slope and intercept) rather than a point. Assignment of a point to a cluster is based on the smallest squared residual to the cluster’s line.
- Spectral clustering is equipped with a novel kernel that measures “linear similarity”: two points are considered similar if the line passing through them also fits many other points. This kernel yields a similarity matrix on which standard spectral techniques (eigen‑decomposition + K‑means) are applied.
The clustering step returns the smallest number of linear clusters (B’) that achieve a mean‑squared error below a user‑specified threshold (B_{\epsilon}). If (B’) exceeds a pre‑set maximum (B), the algorithm aborts.
-
Decision‑tree learning on auxiliary variables. After clustering, each trace receives a label indicating its linear cluster. The auxiliary variables (function‑call counts, flags, etc.) become the sole features for a conventional classification‑tree learner (CART, C4.5, or the piecewise‑constant CART variant CART). Cross‑validation is used to prune the tree and control over‑fitting. Finally, the leaf labels are replaced by the corresponding linear functions obtained in step 1, yielding the full DRT.
Algorithm 1 in the paper formalizes this pipeline: extract (X, y) pairs, perform linear clustering, check size constraints, construct the labeled dataset ((Z, \ell)) where (Z) are auxiliary variables and (\ell) are cluster IDs, learn a decision tree, and embed the affine models.
The authors implemented the method in a prototype tool called DPDEBUGGER and evaluated it on several Java benchmarks, notably Apache FOP (a PDF/PS rendering library) and JFreeChart. In the FOP case study, two PNG images of identical byte size exhibited a seven‑fold runtime difference. DPDEBUGGER’s DRT identified two decisive predicates: (i) whether the function encodeRenderImage‑RGB was invoked, and (ii) the number of calls to getICCProfile. The analysis revealed that the slower images required color‑profile decompression or exceeded dimension limits, explaining the performance gap without any code change.
Quantitatively, the proposed approach outperformed state‑of‑the‑art regression‑tree learners such as M5Prime and GUIDE. While M5Prime required ~97 seconds and GUIDE ~1233 seconds to build models on the same data, DPDEBUGGER completed the task in 14.4 seconds—a speed‑up of an order of magnitude. Moreover, the DRT’s classification accuracy (i.e., correctly assigning traces to the right linear cluster) was higher, and the resulting tree was considerably smaller, enhancing interpretability.
Key contributions highlighted by the authors are:
- Definition of discriminant regression trees tailored for differential performance debugging.
- Novel extensions of K‑means and spectral clustering to produce functional (linear) clusters.
- A two‑phase learning framework that reduces the NP‑hard discriminant‑learning problem to tractable clustering and classification sub‑problems.
- An open‑source implementation and empirical evidence of superior scalability and explanatory power compared to existing regression‑tree methods.
The paper also discusses limitations and future directions. The current method assumes that performance differences can be captured by piecewise‑linear models; extending to polynomial or non‑linear functional clusters could broaden applicability. Selecting the appropriate number of clusters (K) remains a user‑driven choice, though model‑selection criteria (e.g., BIC) could be integrated. Finally, richer auxiliary information (e.g., hardware counters, memory‑access patterns) and multi‑objective performance metrics (time, memory, energy) are promising avenues for further research.
In summary, the work presents a practical, efficient, and interpretable technique for pinpointing the root causes of performance anomalies across input classes. By coupling linear clustering with conventional decision‑tree learning, it delivers concise explanations that help developers quickly decide whether a performance discrepancy stems from algorithmic complexity, data‑dependent behavior, or genuine bugs, thereby advancing the state of the art in performance debugging.
Comments & Academic Discussion
Loading comments...
Leave a Comment