主权项 |
1. A method for filtering a candidate document in a candidate document set, wherein the candidate document set comprises at least one candidate document, the method comprising:
receiving one or more entity word—topic word pairs; identifying one or more entity words of the candidate document by a processor, wherein the one or more entity words are words indicating focused entities of the candidate document; identifying, based on each identified entity word, one or more topic words related to based entity words in the candidate document where the identified entity word is located; determining, by the processor, whether to add the candidate document into a filtered document set using the entity words and topic words in the given entity word—topic word pairs and the identified entity words and topic words in the candidate document; and adding the candidate document into the filtered document set in response to determining that the candidate document should be added into the filtered document set, wherein: each of the entity word—topic word pairs comprise an entity word and a topic word; all entity words in the entity word—topic word pair form an entity word set; and all topic words in the entity word—topic word pair where each entity word is located form a topic word set corresponding to the entity word. |