Modeling Word Relatedness in Latent Dirichlet Allocation
Standard LDA model suffers the problem that the topic assignment of each word is independent and word correlation hence is neglected. To address this problem, in this paper, we propose a model called Word Related Latent Dirichlet Allocation (WR-LDA) by incorporating word correlation into LDA topic models. This leads to new capabilities that standard LDA model does not have such as estimating infrequently occurring words or multi-language topic modeling. Experimental results demonstrate the effectiveness of our model compared with standard LDA.
💡 Research Summary
The paper “Modeling Word Relatedness in Latent Dirichlet Allocation” introduces Word Related Latent Dirichlet Allocation (WR‑LDA), a modification of the classic Latent Dirichlet Allocation (LDA) that explicitly incorporates word‑to‑word semantic relationships. The authors begin by pointing out a fundamental limitation of standard LDA: the topic assignment for each token is conditionally independent of all other tokens, which means that synonymous or semantically related words are treated as unrelated unless they co‑occur in the same context. This shortcoming becomes especially problematic for low‑frequency words, for synonym handling, and for cross‑lingual scenarios where words from different languages never appear together.
To address this, WR‑LDA constructs an undirected graph G = (V, E) over the vocabulary V. Each edge (w, w′) carries a weight κ₍w,w′₎ that quantifies the similarity between the two words. The similarity can be derived from external resources such as synonym dictionaries, word‑embedding cosine similarity, or bilingual lexicons. The central regularization term is a graph‑harmonic loss:
R(β) = ½ ∑₍w,w′₎ ∑ₖ κ₍w,w′₎ (βₖw − βₖw′)²
where βₖ is the K‑by‑V matrix of topic‑word distributions. This loss penalizes large differences between the probability of generating similar words from the same topic, encouraging smoothness of β over the graph.
The overall objective combines the standard LDA log‑likelihood L(α, β) with the regularizer:
O(α, β) = λ L(α, β) − (1 − λ) R(β)
The scalar λ∈
Comments & Academic Discussion
Loading comments...
Leave a Comment