Multi-Integration of Labels across Categories for Component Identification (MILCCI)
Many fields collect large-scale temporal data through repeated measurements (trials), where each trial is labeled with a set of metadata variables spanning several categories. For example, a trial in a neuroscience study may be linked to a value from category (a): task difficulty, and category (b): animal choice. A critical challenge in time-series analysis is to understand how these labels are encoded within the multi-trial observations, and disentangle the distinct effect of each label entry across categories. Here, we present MILCCI, a novel data-driven method that i) identifies the interpretable components underlying the data, ii) captures cross-trial variability, and iii) integrates label information to understand each category’s representation within the data. MILCCI extends a sparse per-trial decomposition that leverages label similarities within each category to enable subtle, label-driven cross-trial adjustments in component compositions and to distinguish the contribution of each category. MILCCI also learns each component’s corresponding temporal trace, which evolves over time within each trial and varies flexibly across trials. We demonstrate MILCCI’s performance through both synthetic and real-world examples, including voting patterns, online page view trends, and neuronal recordings.
💡 Research Summary
The paper introduces MILCCI (Multi‑Integration of Labels across Categories for Component Identification), a novel data‑driven framework for analyzing large collections of time‑series trials that are each annotated with multiple categorical metadata variables. Traditional approaches such as PCA, ICA, demixed PCA (dPCA), or standard tensor factorizations either ignore trial‑to‑trial variability, treat all trials as a single homogeneous matrix, or require a rigid tensor structure that does not explicitly incorporate label information. Consequently, they struggle to disentangle the distinct contributions of several label categories, especially when labels change jointly or when subtle, label‑dependent adjustments in component composition occur.
Core Model.
MILCCI assumes that each latent component belongs to a specific label category (e.g., task difficulty, animal choice). For a given category k, a three‑dimensional tensor A⁽ᵏ⁾ ∈ ℝ^{N×pₖ×|k|} is learned, where N is the number of observed channels (neurons, sensors, etc.), pₖ is the number of components allocated to that category, and |k| is the number of distinct label options within the category. The i‑th slice A⁽ᵏ⁾::i corresponds to the variant of the components when the trial’s label for category k takes the i‑th value.
For each trial m with label L(m) = (L⁽ᵃ⁾(m), L⁽ᵇ⁾(m), …), MILCCI selects the appropriate slice from every category tensor, concatenates them horizontally to form a trial‑specific loading matrix A(L(m)) ∈ ℝ^{N×P} (P = Σₖ pₖ), and models the observed data Y(m) ∈ ℝ^{N×Tₘ} as
Y(m) = A(L(m)) Φ(m) + ε,
where Φ(m) ∈ ℝ^{P×Tₘ} is a trial‑specific temporal trace matrix and ε is i.i.d. Gaussian noise. This construction guarantees that trials sharing the same label tuple use identical loadings, while allowing the temporal traces to vary freely across trials.
Sparsity and Label‑Similarity Regularization.
Component memberships are encouraged to be sparse via a Laplace prior (equivalently an L₁ penalty γ₁). Moreover, MILCCI leverages the intuition that variants belonging to the same category should be similar when their label values are similar. A pre‑computed label‑similarity graph λ⁽ᵏ⁾ ∈ ℝ^{|k|×|k|} encodes either binary similarity for categorical variables or distance‑based similarity for ordinal variables. During learning, a term γ₂ ∑{i≠j} λ⁽ᵏ⁾{ij}‖A⁽ᵏ⁾::i – A⁽ᵏ⁾::j‖² penalizes large deviations between related variants, thus enforcing smooth transitions across label space.
Learning Procedure.
MILCCI proceeds in three stages: (1) compute λ⁽ᵏ⁾ for all categories; (2) initialize tensors and traces (details in the appendix); (3) iteratively update components and traces until convergence.
Component Update. For each category‑label option i, all trials that exhibit that option (irrespective of other categories) are gathered. For each such trial, a residual matrix eY(m,k) = Y(m) – Σ_{k’≠k} A⁽ᵏ’⁾::arg(L⁽ᵏ’⁾(m)) Φ(m) is formed, removing contributions from all other categories. The variant A⁽ᵏ⁾::i is then estimated by solving a LASSO problem that balances data fidelity, sparsity (γ₁), and similarity to other variants (γ₂ λ⁽ᵏ⁾). Non‑negativity constraints can be imposed when appropriate. After solving, each component slice is normalized to avoid scaling ambiguity with Φ(m).
Trace Update. With A(L(m)) fixed, Φ(m) is updated per trial by minimizing a composite objective consisting of (i) reconstruction error, (ii) a temporal smoothness penalty γ₃ ∑t‖Φ_t – Φ{t‑1}‖², and (iii) a decorrelation term γ₄ that discourages excessive correlation among different component traces within the same trial. This yields trial‑specific, smooth temporal dynamics that are not forced to follow any label pattern.
Experimental Validation.
Synthetic experiments demonstrate that MILCCI recovers ground‑truth components and temporal traces far more accurately than PCA, dPCA, PARAFAC, HOSVD, SliceTCA, and a recent cross‑trial model (Mudrik 2024). The method excels at detecting components that subtly change composition when a label changes (e.g., a neuronal ensemble recruiting extra cells under higher task difficulty).
Real‑world case studies include:
Voting Data. US state‑by‑year election results are modeled with labels “party”, “state‑specific issue”, and “year”. MILCCI uncovers components that capture party‑wide trends, issue‑driven shifts, and year‑specific anomalies, matching known historical events (e.g., the 2016 swing).
Online Page‑View Trends. Web traffic logs are annotated by “language”, “device type”, and “time‑of‑day”. The method isolates language‑specific seasonal patterns, distinguishes mobile vs. desktop usage, and reveals cross‑category interactions that standard factor models miss.
Neural Recordings. Multi‑region electrophysiology from mice performing a two‑alternative forced‑choice task is labeled by “task difficulty” (easy/hard) and “choice” (left/right). MILCCI identifies a core decision‑related ensemble present in all trials, plus a difficulty‑dependent sub‑ensemble that recruits additional neurons only during hard trials. This nuanced finding is invisible to dPCA, which would either merge the two effects or force a single static component.
Strengths, Limitations, and Future Directions.
MILCCI’s primary strength lies in its explicit, label‑conditioned tensor representation, which allows components to be both interpretable (sparse, category‑specific) and adaptable (smoothly varying across label values). The incorporation of label similarity graphs enables handling of both categorical and ordinal metadata without manual engineering. Moreover, the framework naturally accommodates trials of varying length, a common obstacle for many tensor methods.
Limitations include computational scaling: the number of tensor slices grows with the product of label cardinalities, potentially leading to memory and runtime bottlenecks for datasets with many categories or high‑cardinality labels. The current linear mixing assumption (Y = A Φ) may also be insufficient for strongly non‑linear dynamics; extending MILCCI with kernelized or deep generative components is an open avenue. Finally, uncertainty quantification is limited to point estimates; a fully Bayesian treatment could provide credible intervals for component loadings and traces.
The authors suggest future work on (i) modeling higher‑order interactions between categories via additional tensor modes, (ii) developing stochastic or variational inference schemes to scale to massive label spaces, and (iii) integrating Bayesian posterior analysis for robust inference.
In summary, MILCCI offers a powerful, flexible, and interpretable solution for dissecting multi‑label, multi‑trial time‑series data. By marrying sparse, category‑specific component tensors with label‑similarity regularization and trial‑specific temporal dynamics, it surpasses existing dimensionality‑reduction and tensor‑factorization techniques in both recovery accuracy and scientific insight.
Comments & Academic Discussion
Loading comments...
Leave a Comment