Linear Time Series Models for Term Weighting in Information Retrieval
In: Journal of the American Society for Information Science and Technology (Print), Jg. 61 (2010), Heft 7, S. 1299-1312
academicJournal
- print, 1 p.1/4
Zugriff:
Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this article is the temporal behavior of terms as a collection changes over time. We propose capturing each term's collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a weakly discriminative term x at time t is predictable by a linear model of the term's prior observations. On the other hand, a linear time series model for a strong discriminators' collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time-based measures of term importance and test these against state-of-the-art term weighting models.
Titel: |
Linear Time Series Models for Term Weighting in Information Retrieval
|
---|---|
Autor/in / Beteiligte Person: | EFRON, Miles |
Link: | |
Zeitschrift: | Journal of the American Society for Information Science and Technology (Print), Jg. 61 (2010), Heft 7, S. 1299-1312 |
Veröffentlichung: | New York, NY: Wiley, 2010 |
Medientyp: | academicJournal |
Umfang: | print, 1 p.1/4 |
ISSN: | 1532-2882 (print) |
Schlagwort: |
|
Sonstiges: |
|