发明名称 DEVICES, SYSTEMS, AND METHODS FOR RESOLVING NAMED ENTITIES
摘要 An information processing apparatus to select a token from a document to describe a field of interest includes an obtaining unit, a determining unit, a clustering unit, and a selecting unit. The obtaining unit obtains a list of tokens output from extractors that received the document as an input. Each output token has an extractor score assigned to by an extractor. The determining unit determines, as a word frequency value, a frequency of each word in the list of tokens, determines a token score for each token in the list of tokens, and determines a distance between each token in the list of tokens. The clustering unit clusters each token in the list of tokens into a plurality of groups. The selecting unit selects a token with a group of the plurality of groups to describe the field of interest in the document.
申请公布号 US2017060837(A1) 申请公布日期 2017.03.02
申请号 US201615253548 申请日期 2016.08.31
申请人 CANON KABUSHIKI KAISHA 发明人 Dusberger Dariusz T.;Dietz Quentin
分类号 G06F17/27;G06F17/30 主分类号 G06F17/27
代理机构 代理人
主权项 1. A method for an information processing apparatus to select a token from a document to describe a field of interest in the document, the method comprising: obtaining a list of tokens output from a plurality of extractors that received the document as an input, wherein each output token has an extractor score assigned to by an extractor of the plurality of extractors; merging the tokens in the list of tokens into a plurality of groups, wherein each group in the plurality of groups includes tokens whose word tokenized form is a fuzzy sublist/superlist of one another; adding the extractor score of each token in a group to determine a group score for each of the plurality of groups; selecting the group with the highest group score from the plurality of groups; and selecting a token within the selected group to describe the field of interest in the document.
地址 Tokyo JP