发明名称 DISCOVERY INFORMATICS SYSTEM, METHOD AND COMPUTER PROGRAM
摘要 A discovery informatics system is arranged to produce a graph based on a corpus of textual documents, the graph including documents in the corpus as nodes, with links between the nodes annotated by connecting concepts, the connecting concepts directly and indirectly connecting the documents. The system comprises: a contents interface arranged to download the document contents from the textual documents in the corpus; a preliminary processor arranged to extract a graph of concepts from the document contents, wherein nodes of the concept graph represent the concepts, which are entities in the documents and weighted edges between pairs of nodes are weighted relations between the entities, the weights representing the relative significance of particular relationships; a filter arranged to filter the weighted edges between the nodes to retain edges with higher weights providing candidate paths between all the concepts; at least two scoring modules each arranged to score the candidate paths according to a scoring measure, wherein the measures model different aspects of the fitness of the paths for discovering facts within the corpus; an optimiser arranged to identify optimised paths of the concept graph that satisfy the scoring measures in an optimal manner; a document graph generator arranged to generate a graph of the documents in the corpus with concept-annotated links between them based on the optimised paths; and a graphical user interface, GUI, arranged to enable the user to view and navigate the document graph to discover facts within the corpus.
申请公布号 US2016321357(A1) 申请公布日期 2016.11.03
申请号 US201615086310 申请日期 2016.03.31
申请人 FUJITSU LIMITED 发明人 Novacek Vit;Al Darra Suad;Vandenbussche Pierre-Yves
分类号 G06F17/30;G06F17/24 主分类号 G06F17/30
代理机构 代理人
主权项 1. A discovery informatics system arranged to produce a graph based on a corpus of textual documents, the graph including documents in the corpus as nodes, with links between the nodes annotated by connecting concepts, the connecting concepts directly and indirectly connecting the documents, system comprising: a contents interface arranged to download the document contents from the textual documents in the corpus; a preliminary processor arranged to extract a graph of concepts from the document contents, wherein nodes of the concept graph represent the concepts, which are entities in the documents and weighted edges between pairs of nodes are weighted relations between the entities, the weights representing the relative significance of particular relationships; a filter arranged to filter the weighted edges between the nodes to retain edges with higher weights providing candidate paths between all the concepts; at least two scoring modules each arranged to score the candidate paths according to a scoring measure, wherein the measures model different aspects of the fitness of the paths for discovering facts within the corpus; an optimiser arranged to identify optimised paths of the concept graph that satisfy the scoring measures in an optimal manner; a document graph generator arranged to generate a graph of the documents in the corpus with concept-annotated links between them based on the optimised paths; and a graphical user interface, GUI, arranged to enable the user to view and navigate the document graph to discover facts within the corpus.
地址 Kawasaki-shi JP