主权项 |
1. A method for training a web crawler, wherein the web crawler maintains one or more categories each comprising a set of words, the method comprising:
in response to receiving a query from a user:
selecting, by a processor, at least one hyperlink based on the set of words;determining, by the processor, a hyperlink score for the at least one hyperlink based on a predetermined category score associated with each of one or more categories and a membership value of the at least one hyperlink for each of the one or more categories;updating, by the processor, the predetermined category score associated with each of the one or more categories based at least on a discount factor associated with the predetermined category score and an association of learning rate with a measure of contribution of the one or more categories for the selection of the at least one hyperlink and another measure of correctness of the selection of the at least one hyperlink with respect to semantic of the query;comparing, by the processor, the updated predetermined category score with the hyperlink score to select a category from the one or more categories; andupdating, by the processor, the set of words associated with the category based on content of a web page pointed by the at least one hyperlink. |