摘要 |
<p>The present invention relates to the field of network data communications, and disclosed are a method and device for acquiring a movie and television subject from a webpage. The method comprises: extracting texts to be mined from a webpage to be subjected to movie and television subject mining, and according to a preset segmentation rule, segmenting the texts to be mined, so as to obtain a sentence set; in the sentence set, extracting the longest common clause from all the sentences as a candidate movie and television subject; and according to the candidate movie and television subject, determining a movie and television subject of the webpage. The present invention can avoid a complicated process of compiling a wrapper when a large number of websites needing to be parsed exist in the prior art, and can also overcome the defect of parsing failure due to the fact that the change of a webpage structure cannot be detected in real time when the webpage structure frequently changes.</p> |