DEVICES, SYSTEMS, AND METHODS FOR RESOLVING NAMED ENTITIES,申请号US201615253548-传众专利搜索

发明名称	DEVICES, SYSTEMS, AND METHODS FOR RESOLVING NAMED ENTITIES
摘要	An information processing apparatus to select a token from a document to describe a field of interest includes an obtaining unit, a determining unit, a clustering unit, and a selecting unit. The obtaining unit obtains a list of tokens output from extractors that received the document as an input. Each output token has an extractor score assigned to by an extractor. The determining unit determines, as a word frequency value, a frequency of each word in the list of tokens, determines a token score for each token in the list of tokens, and determines a distance between each token in the list of tokens. The clustering unit clusters each token in the list of tokens into a plurality of groups. The selecting unit selects a token with a group of the plurality of groups to describe the field of interest in the document.
申请公布号	US2017060837(A1)	申请公布日期	2017.03.02
申请号	US201615253548	申请日期	2016.08.31
申请人	CANON KABUSHIKI KAISHA	发明人	Dusberger Dariusz T.;Dietz Quentin
分类号	G06F17/27;G06F17/30	主分类号	G06F17/27
代理机构		代理人
主权项	1. A method for an information processing apparatus to select a token from a document to describe a field of interest in the document, the method comprising: obtaining a list of tokens output from a plurality of extractors that received the document as an input, wherein each output token has an extractor score assigned to by an extractor of the plurality of extractors; merging the tokens in the list of tokens into a plurality of groups, wherein each group in the plurality of groups includes tokens whose word tokenized form is a fuzzy sublist/superlist of one another; adding the extractor score of each token in a group to determine a group score for each of the plurality of groups; selecting the group with the highest group score from the plurality of groups; and selecting a token within the selected group to describe the field of interest in the document.
地址	Tokyo JP