发明名称 Document analysis system
摘要 An information processing apparatus (5) is provided comprising: a lexicon generation module (22) operable to process a set of documents (1) to identify key words (2) present in the documents; a link generation module (24) operable to generate network data (3) linking documents which share the same or semantically related key words identified by the lexicon generation module; and a network analysis module (26) operable to associate documents with metric values based upon the patterns of connectivity of the network data generated by the link generation module. The metric values associated with documents in the set can be utilized to select documents or groups of associated documents for further processing or indexing.
申请公布号 US8862586(B2) 申请公布日期 2014.10.14
申请号 US201113015832 申请日期 2011.01.28
申请人 E-Therapeutics PLC 发明人 Young Malcolm P.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Brooks, Cameron & Huebsch, PLLC 代理人 Brooks, Cameron & Huebsch, PLLC
主权项 1. A non-transitory computer readable medium storing computer interpretable instructions which when interpreted by a programmable computer cause the computer to: process a set of documents to identify items of semantic content present in the documents, wherein the items of semantic content are identified by appearing less than a predetermined threshold number of times and filtering out those items that appear in all the documents in the document set; associate each of the documents in the set with a node number; generate network data linking nodes associated with documents which share same or related items of semantic content; process the generated network data to generate metric values for each of the nodes based upon patterns of connectivity of links between the nodes as indicated by the generated network data; and utilize the generated metric values to select one or more documents from the set of documents as being representative of the contents of the set.
地址 Newcastle Upon Tyne Tyne and Wear GB