发明名称 |
GENERATING DESCRIPTIVE TOPIC LABELS |
摘要 |
A method to generate a topic label for a set of electronic documents may include crawling, by a processor, the set of electronic documents. The method may include extracting knowledge points from the set of electronic documents. The method may also include selecting a candidate set of knowledge points from the plurality of knowledge points based on occurrence values. The method may include calculating relatedness scores between each knowledge point in the candidate set of knowledge points. The method may also include calculating hierarchical relationships between each knowledge point in the candidate set. The method may further include calculating comprehensive scores for each knowledge point in the candidate set based on the relatedness scores and the hierarchical relationships. The method may include selecting, from the set of knowledge points, a first candidate knowledge point with the highest comprehensive score as a topic label for the set of electronic documents. |
申请公布号 |
US2017103074(A1) |
申请公布日期 |
2017.04.13 |
申请号 |
US201514880087 |
申请日期 |
2015.10.09 |
申请人 |
FUJITSU LIMITED |
发明人 |
WANG Jun;UCHINO Kanji |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method comprising:
crawling, by a processor, a set of electronic documents stored at least temporarily in a non-transitory storage media; extracting a plurality of knowledge points from the set of electronic documents; selecting a candidate set of knowledge points from the plurality of knowledge points based on occurrence values of the plurality of knowledge points in the set of electronic documents; calculating relatedness scores between each knowledge point in the candidate set of knowledge points; calculating hierarchical relationships between each knowledge point in the candidate set of knowledge points; calculating comprehensive scores for each knowledge point in the candidate set of knowledge points based on the relatedness scores and the hierarchical relationships; and selecting, from the set of candidate knowledge points, a first candidate knowledge point that has a highest comprehensive score as a topic label for the set of electronic documents. |
地址 |
Kawasaki-shi JP |