摘要 |
A technique for optimizing the number of terms in a profile used for information extraction. This optimization is performed by estimating the number of terms which will substantively affect the information extraction process. That is, the technique estimates the point in a term weight curve where that curve becomes flat. A term generally is important and remains part of the profile as long as its weight and the weight of the next term may be differentiated. When terms' weights are not differentiable, then they are not significant and may be cut off. Reducing the number of terms used in a profile increases the efficiency and effectiveness of the information retrieval process.
|