Sparse bayesian step-filtering for high-throughput analysis of molecular machine dynamics

Nature has evolved many molecular machines such as kinesin, myosin, and the rotary flagellar motor powered by an ion current from the mitochondria. Direct observation of the step-like motion of these machines with time series from novel experimental …

Authors: Max A. Little, Nick S. Jones

Nanotechnology promises extraordinary control over matter at the molecular scale, with widespread applications from materials to medicine. However, constructing useful machines at this scale is enormously challenging because of molecular noise and the complexity of engineering precise conformational dynamics of interacting molecular components. Nature has evolved many robust molecular machines such as pumps, tugs, copiers and motors, and understanding the function of these machines offers the possibility of the biomimetic design of interacting, artificial molecular devices. Biophysical theory proposes that these motors, that convert electrochemical to linear or rotary kinetic energy, do so in a series of rapid, nano-scale, step-like motions because this maximizes the use of the available free energy in chemical bonds [1]. This step-like motion has recently been observed, in vivo, using advanced experimental assays exploiting techniques such as Förster resonance energy transfer, optical traps or atomic force microscopy [2]. These assays produce large volumes of time series data with sampling rates that often exceed 100kHz, but the step-like motions of the molecular components is inevitably obscured by Brownian and experimental noise [1,3]. This noise must be removed to extract the underlying molecular conformational dynamics. This noise removal is a challenging signal processing problem, because the signal-to-noise ratio is low and the signal itself is step-and impulse-like, i.e. it is highly discontinuous. Thus, the time scale for changes in the noise and the signal are similar, so that the support in the Fourier domain of the signal and the noise overlap considerably. Separation in the Fourier domain using classical linear filtering is, therefore, largely intractable. Special techniques are therefore required that can cope with both high noise levels and discontinuous signals. Although special step-smoothing filters do exist [4], they have various shortcomings. Nonlinear adaptations to the classical running mean filter [5] -given the fundamental problem of frequency inseparability of steps from noise -lack statistical robustness. Others are based on greedy (locally optimal), successive subdivision search for best step locations, but they are computationally intensive and the solutions they find may be suboptimal [6,7]. There has also been much recent work on hidden Markov modeling to solve similar filtering problems [8], but the computational complexity of statistical inference with these techniques is substantial, and generally prohibitive for highthroughput experimental situations. There is therefore the need for robust step-smoothing filters that produce optimal solutions with minimal computational effort and good statistical power. Typical noisy step-like time series from experimental assays of molecular machines is depicted in Fig. 1(a). The prototypical time series is a series of piecewise-constant segments with superimposed, additive, i.i.d. Gaussian noise. The steps between the constant segments typically occur at time intervals that are exponentially distributed, and the steps may be upwards or downwards. This is very similar to a Poisson process, except that the event count can go down as well as up. The observed discretetime signal is defined here as As a simple reference algorithm that is very commonly used in this application, we investigate the classical running median filter [9], which replaces the middle sample of a moving window that runs through the time series with the median of the samples in that window. The only parameter is the window length W. This filter is a potentially useful candidate for step-filtering because it is computationally and conceptually very simple, but, unlike the running mean filter, it leaves and impulse-like root signals unchanged. The filter will leave only root signals if the signal is run through the filter sufficient number of times. Another justification for the operation of the filter is that it is the maximumlikelihood estimate of the location, m of the distribution of samples in the window, if they are Laplace-distributed. The negative likelihood function of the window samples is therefore: where w is the size W index set of samples in each window, A is an unimportant normalization factor, and a is the spread of the Laplace distribution. Minimizing this function with respect to m is equivalent to minimizing , which is solved when m is the median of the samples [10]. The running median filter has some shortcomings that make it less than ideal for step-filtering. In particular, the root signals include monotonic sections (up/down ramps) that cannot feature in n µ by definition. As an improvement, if we assume we know the step positions, we can place a Bayesian Laplace mixture prior over the estimate m for each window: where K is an unimportant constant that can be ignored when finding the maximum a-posteriori (MAP) solution m for each window (because it does not depend on m). Unfortunately, since this negative log posterior is not a convex function, standard minimization algorithms are not guaranteed to find the global minimum solution. However, since there is only one parameter m, a first-pass brute-force search for a rough estimate of m can be combined with a second-pass golden-section search refinement [11]. This is guaranteed to converge on the optimum if it is within the range of the first-pass search. The above filter requires knowledge of the step positions; in certain experimental situations these are known, for example, highthroughput DNA sequencing by nanopore blockade current. In other circumstances, step positions will be unknown. Furthermore, the non-convexity of the negative log posterior makes finding the optimal m difficult. Also, this windowed algorithm may fail to smooth out steps longer than the window size or erroneously smooth away multiple steps within a single window. A different approach is offered by global filtering that finds an optimal solution for the whole time series at once, rather than considering a sliding window of samples. It is reasonable to assume that the additive error around each step is Gaussian, and thus we assume that the likelihood is Gaussian. Furthermore, since the most prominent property of step-filtering is that many of the underlying, adjacent step samples n µ take on the same values, the L 1 -regularized fused-LASSO [12] offers a plausible Bayesian method. This prescription allows us to write down the following negative log posterior: where λ is a regularization parameter, N is the number of samples in the time series, and K is another unimportant constant. For 0 = λ , the optimal solution m must be identical to x, and as ∞ → λ , all adjacent m's take on the same value, so k m n → , some arbitrary constant. For all values of λ , the solution is a piecewise-constant curve with a finite number of steps, that is simultaneously the least-squares fit to the data. This piecewiseconstant property emerges because most of the differences between adjacent m's will be zero: this is the recently described sparsity enhancing property of the Laplacian prior [12] of exemplary value in compressive sensing applications. A convenient aspect of ( 4) is that this is a convex quadratic programming problem guaranteeing that a globally optimal solution can be rapidly found to arbitrary precision using standard algorithms [10]. Note that ( 4) is similar to the piecewise-linear smoothing algorithm proposed by Kim et al. [13]. The accuracy of each of the three algorithms -1. median filtering, 2. Bayesian Laplace mixture prior median filtering, and 3. Bayesian These three algorithms have several (hyper)-parameters that must be chosen, for example, the window size W and the regularization parameter λ. Here, to make a fair comparison of the performance of these algorithms, we choose these parameters such that the MAE in recovering the known step-like dynamics is minimized. In practice, these parameters would be chosen, using, for example, an appropriate cross-validation scheme. The data for this study comes from WS8N wild-type Rhodobacter sphaeroides cells. The flagella (tails) are removed, and 0.83 micron beads are attached to the flagellar hook. The beads are then laser illuminated, and the speed of rotation of the flagellar motor against time is recorded. Fig. 2(a) depicts a typical time series from this experimental assay, which is described fully in Pilizota et al. [14]. Example synthetic time series and the algorithm estimates are shown in Fig. 1(b),(c) and (d). As can be seen, the Bayesian Laplace mixture prior median filter is the most accurate up to a noise variance of around 0.8. However, this filter requires strong assumptions about the step positions, and usually this information is not known in advance. The performance of the algorithms on synthetic stepping time series with unit step height across a range of noise variances is shown in Table 1. The median filter has the worst overall performance, as the error can reach as much as 20% of the step height. By contrast, the two novel Bayesian filters can readily achieve errors of less than 10% of the step height. Fig. 2. shows a typical experimental time series and the results of applying the three step-smoothing filters. As can be seen, the time series has fairly long periods of constant rotation (obscured by noise), but it also has periods where the speed changes rapidly from stopped, to upwards of 100Hz. Thus, noise removal is extremely challenging for classical running filters such as the median filter, which cannot generally detect this situation where several large changes are occurring within a window. The Bayesian adaptation to the median filter performs better: it not only produces the smoothest estimates of the constant rotation periods, but responds better to the rapid speed change events than the median filter. The L 1 -regularized fused-LASSO global filter is capable of responding to the rapid speed changes, but is not as effective as the Bayesian median filter at smoothing the constant rotation periods. We studied noise removal for time series recorded from experimental assays of the step-like dynamics of molecular machines. We introduced two novel Bayesian filters, both of which outperformed the classical running median filter in accurate recovery of the underlying stepping behaviour in synthetic time series when corrupted by noise of increasing variance. We also demonstrated that these two new filters are capable of recovering rapid stepping combined with piecewise-smooth segments obscured by noise, but the classical median filter fails in this situation. Given the poor accuracy of the median filter, it is only useful where the noise has low variance relative to the step height. Overall then, we might prefer the Bayesian L 1 -regularized fused-LASSO filter, because it produces results of similar accuracy to the Bayesian Laplace mixture prior median filter, but has only one hyperparameter. As demonstrated here, analysing time series from molecular machines opens up interesting new challenges in signal processing. Furthermore, the volume of experimental data being generated is growing at an exponential rate. Classical statistical signal processing tools are simple and computationally cheap, and thus well-suited to high-throughput analysis, but the mathematical assumptions of traditional statistical signal processing are fundamentally inappropriate for step-and impulse-like behaviour corrupted by high noise levels typical of time series from these experimental assays. Our aim in this paper has been to design novel methods that combine the best of both worlds: computational robustness and simplicity with accuracy for the problem at hand.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment