Local Term Weight Models from Power Transformations: Development of BM25IR: A Best Match Model based on Inverse Regression

Reading time: 5 minute
...

📝 Original Info

  • Title: Local Term Weight Models from Power Transformations: Development of BM25IR: A Best Match Model based on Inverse Regression
  • ArXiv ID: 1608.01573
  • Date: 2016-08-05
  • Authors: Edel Garcia

📝 Abstract

In this article we show how power transformations can be used as a common framework for the derivation of local term weights. We found that under some parametric conditions, BM25 and inverse regression produce equivalent results. As a special case of inverse regression, we show that the largest increment in term weight occurs when a term is mentioned for the second time. A model based on inverse regression (BM25IR) is presented. Simulations suggest that BM25IR works fairly well for different BM25 parametric conditions and document lengths.

💡 Deep Analysis

Figure 1

📄 Full Content

The weight of a term i in a document j with an occurrence frequency is computed by assigning a local ( ), global ( ), and normalization ( ) weight. Several models for doing this have been proposed, some using a combination of assumptions, intuitions and experimental observations (Chisholm &Kolda, 1999;Jones & Furnas, 1987;Lee, Chuang, & Seamons, 1997;MacFarlane, 2001;Robertson, 2004;Robertson & Walker, 1994;Robertson, Walker, Jones, Hancock-Beaulieu, & Gatford, 1994;Robertson & Zaragoza, 2009;Salton & Buckley, 1987;Salton, Wong, & Yang, 1975;Salton & Yang, 1973;Sanderson & Ruthven, 1996).

However no common framework is given for their systematic derivations. The purpose of this article is to present such a framework based on power transformations. We show that many of the local weights models found in the literature, and new ones, can be derived in this way.

Typically there are three reasons for modifying a data set through power transformations:

 To make the distribution of a data set closer to that of a normal distribution.

 To linearize the relationships between variables.

 To stabilize the variance.

While it is true that word occurrences in documents have been modeled as belonging to

Poisson mixtures, it is also true that good keywords are far from Poisson (Church & Gale, 1995a;1995b).

Although power transformations ensure that the assumption for linearity, normality, and homoscedasticity hold, the main objective is to make inferences on the power transformation parameter, even in cases where no power-transformation brings a distribution to normality (Li, 2005). It is in this context that power transformation methods are used in the present article.

The best known power transformation models are due to Tukey (1957) and Box & Cox (1964).

Tukey:

(1)

Box-Cox:

(2) where y is a numerical value y* is a transformed value, is a power parameter that can adopt any real value is a positive constant typically used to offset any negative or zero y value.

These transformations are very effective when the data do not describe an inflection point (Hossain, 2011;Steiger, 2009;Sakia, 1992). A comparative of these models is given in Table 1. The difference between the models is that for  0 Box-Cox’s model shifts by -1 and normalize with the scales. For = 1 and = 0, Tukey’s model does not change the data, but

Box-Cox’s subtracts 1. This does not change the results, though.

For = 0, both models return logs, but by different means: In the Tukey model, the derivative dy*/d at = 0 is evaluated where in the Box-Cox model, the l’Hôpital’s Rule is applied. In both cases the base of the logarithms does not matter.

Among others, the following transformations are obtained from both models by setting :

 square root ( = 0.5)

 inverse square root ( = -0.5)

Nowadays Box-Cox transformations are preferred over Tukey’s. So it is not surprising to see these applied in IR. For instance, Gerani, Zhai, & Crestani (2012) used the transformations in relevance ranking work. Molina, Torres-Moreno, SanJuan, Sierra, & Rojas-Mora (2013) applied Box-Cox transformations to low term frequencies. Lv & Zhai (2011) and Zhou (2014) have use these to overcome problems associated to document lengths in BM25 models.

However, at the time of writing, these transformations have not been used as a framework for systematically deriving local weights. This is precisely the purpose of the present article.

Some of the models derived below are discussed by Chisholm & Kolda (1999) and in a previous tutorial (Garcia, 2016). For consistency sake with those reports, we adopt the following conventions. y* is replaced with , y with , with p, with k, and ln with log where log are base 2 logarithms, although the base used does not really matter. Table 2 lists the models derived from Tukey and Box-Cox power transformations for p = 2, 1, ½, 0, -1/2, -1, and -2 where k > 0.

A close look at Table 2 reveals that for f i,j + k = 1 and p  0, Tukey’s model returns whereas for f i,j + k = 1 and any value of p, Box-Cox’s model returns . Other combinations of p and k produce more interesting solutions.

For instance, the following term weight models (Chisholm & Kolda, 1999;Garcia, 2016) are derived from Table 2 after some slight modifications.

The local term weight component of BM25 is defined as

In general, the relationship between and can be traced back to a precursor formula of the general form for some k > 0 (6) where (6) has been described as an approximation of a mixture of two Poisson distributions; i.e. as an approximation of a 2-Poisson Model (Robertson & Walker, 1994;Baayen, 1993;Church & Gale, 1995;Rennie, 2005;Ogura, Amano, & Kondo, 2013).

Revisiting

where (8) describes an inverse regression curve that is increasing asymptotic for and decreasing asymptotic for . Clearly for k = 1, ( 6) and ( 7) return the same results. Since in the formal BM25 model k is , then for K = 1, .

Figure 1 shows several conceptual differences between ( 6) and (

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut