摘要 |
Described is a technology for identifying sample data items (e.g., documents corresponding to query-URL pairs) having the greatest likelihood of being mislabeled when previously judged, and selecting those data items for re-judging. In one aspect, lambda gradient scores (information associated with ranked sample data items that indicates a relative direction and how “strongly” to move each data item for lowering a ranking cost) are summed for pairs of sample data items to compute re-judgment scores for each of those sample data items. The re-judgment scores indicate a relative likelihood of mislabeling. Once the selected sample data items are re-judged, a new training set is available, whereby a new ranker may be trained.
|