发明名称 |
DEVICES, SYSTEMS, AND METHODS FOR RESOLVING NAMED ENTITIES |
摘要 |
An information processing apparatus to select a token from a document to describe a field of interest includes an obtaining unit, a determining unit, a clustering unit, and a selecting unit. The obtaining unit obtains a list of tokens output from extractors that received the document as an input. Each output token has an extractor score assigned to by an extractor. The determining unit determines, as a word frequency value, a frequency of each word in the list of tokens, determines a token score for each token in the list of tokens, and determines a distance between each token in the list of tokens. The clustering unit clusters each token in the list of tokens into a plurality of groups. The selecting unit selects a token with a group of the plurality of groups to describe the field of interest in the document. |
申请公布号 |
US2017060837(A1) |
申请公布日期 |
2017.03.02 |
申请号 |
US201615253548 |
申请日期 |
2016.08.31 |
申请人 |
CANON KABUSHIKI KAISHA |
发明人 |
Dusberger Dariusz T.;Dietz Quentin |
分类号 |
G06F17/27;G06F17/30 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method for an information processing apparatus to select a token from a document to describe a field of interest in the document, the method comprising:
obtaining a list of tokens output from a plurality of extractors that received the document as an input, wherein each output token has an extractor score assigned to by an extractor of the plurality of extractors; merging the tokens in the list of tokens into a plurality of groups, wherein each group in the plurality of groups includes tokens whose word tokenized form is a fuzzy sublist/superlist of one another; adding the extractor score of each token in a group to determine a group score for each of the plurality of groups; selecting the group with the highest group score from the plurality of groups; and selecting a token within the selected group to describe the field of interest in the document. |
地址 |
Tokyo JP |