Continuous Time Dynamic Topic Models

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper, we develop the continuous time dynamic topic model (cDTM). The cDTM is a dynamic topic model that uses Brownian motion to model the latent topics through a sequential collection of documents, where a “topic” is a pattern of word use that we expect to evolve over the course of the collection. We derive an efficient variational approximate inference algorithm that takes advantage of the sparsity of observations in text, a property that lets us easily handle many time points. In contrast to the cDTM, the original discrete-time dynamic topic model (dDTM) requires that time be discretized. Moreover, the complexity of variational inference for the dDTM grows quickly as time granularity increases, a drawback which limits fine-grained discretization. We demonstrate the cDTM on two news corpora, reporting both predictive perplexity and the novel task of time stamp prediction.

💡 Research Summary

The paper introduces the Continuous‑time Dynamic Topic Model (cDTM), a novel approach for modeling how topics evolve over time in a collection of documents. Traditional dynamic topic models, such as the discrete‑time Dynamic Topic Model (dDTM), require the time axis to be discretized into fixed intervals. While this works for coarse granularity, it becomes problematic when finer temporal resolution is needed: the number of latent variables grows linearly with the number of time slices, leading to prohibitive memory and computational costs. Moreover, the assumption that documents within a slice are exchangeable may be unrealistic at very fine scales.

cDTM addresses these issues by treating each topic’s natural parameters (the log‑probabilities of words) as a continuous‑time stochastic process governed by Brownian motion. Formally, for each topic k and word w, the parameter β_{t,k,w} follows a Gaussian random walk: β_{0,k,w} ∼ N(m, v₀) and β_{t,k,w} | β_{t‑1,k,w} ∼ N(β_{t‑1,k,w}, v·Δs_t), where Δs_t is the elapsed time between consecutive documents. This construction ensures that variance grows linearly with elapsed time, but no intermediate states need to be represented when there are gaps in the data.

The generative process mirrors that of LDA: for each document d_t at timestamp s_t, a topic proportion vector θ_t is drawn from a Dirichlet prior, each word’s topic assignment z_{t,n} is sampled from θ_t, and the observed word w_{t,n} is drawn from a multinomial whose natural parameters are the current β_{t,k}. The mapping π(β) = softmax(β) converts unconstrained log‑probabilities to a valid simplex distribution.

Inference is performed via variational Bayes. The variational distribution factorizes over document‑level variables (θ_t, z_{t,n}) and over the continuous‑time trajectories of each topic β_{1:T,k}. For the latter, the authors adapt variational Kalman filtering: the variational parameters ˆβ_{t,k,w} act as noisy observations of the true β_{t,k,w}. Crucially, because most words are absent at any given time point, the model can exploit sparsity: if a word w is not observed at time t, the corresponding variational observation is omitted, and the forward‑backward Gaussian updates simply propagate the previous mean and variance forward (m_{t,w}=m_{t‑1,w}, V_{t,w}=V_{t‑1,w}+v·Δs_t). This “sparse variational inference” reduces the memory requirement from O(V·T) (dense representation) to O(∑_t |V_t|), where |V_t| is the number of distinct words observed at time t.

The variational objective L(ˆβ) is lower‑bounded by Jensen’s inequality; maximizing this bound reduces to computing gradients only for the observed (δ_{t,w}=1) entries. The authors employ a conjugate‑gradient optimizer to update the ˆβ parameters.

Experimental evaluation uses two news corpora, including a subset of the TREC AP collection. The authors report two metrics: predictive perplexity (lower is better) and a novel timestamp‑prediction task, where the model must infer the creation time of a held‑out document. Results show that cDTM achieves comparable or lower perplexity than dDTM while using significantly less memory, especially when the time granularity is fine (e.g., hourly or daily). In the timestamp‑prediction task, cDTM outperforms dDTM, demonstrating that continuous‑time modeling captures temporal dynamics more faithfully.

Key contributions of the paper are: (1) introducing a continuous‑time formulation for dynamic topic models that eliminates the need for arbitrary discretization; (2) developing a sparse variational Kalman filtering algorithm that leverages the inherent sparsity of text to achieve scalable inference; (3) empirically validating that cDTM provides both computational advantages and improved predictive performance on real‑world data. The work opens avenues for further extensions, such as incorporating non‑linear diffusion processes, modeling interactions among multiple topics, and handling irregularly spaced timestamps in other domains like scientific literature or social media streams.

Continuous Time Dynamic Topic Models

💡 Research Summary

Comments & Academic Discussion

Leave a Comment