发明名称 Automated document revision markup and change control
摘要 Automated comparison of Darwin Information Typing Architecture (DITA) documents for revision mark-up includes reading document data from first and second DITA documents into respective document object model trees of nodes, and identifying and collapsing emphasis subtree nodes in the trees into their parent nodes, the collapsing caching emphasis data from the identified subtree nodes. A traversal transforms the model trees into respective node lists and captures adjacent sibling emphasis subtree nodes as single text nodes. The node lists are merged into a merged node list that recognizes matches node pairs having primary sort key information and document structure metadata meeting a match threshold, with differences between matching tokens of the node pairs saved. A merged document object model built from the refined merged node list is transformed into a hypertext mark-up language document.
申请公布号 US9619448(B2) 申请公布日期 2017.04.11
申请号 US201514844108 申请日期 2015.09.03
申请人 International Business Machines Corporation 发明人 Fischer Stephen E.
分类号 G06F17/22;G06F17/21;G06F17/24;G06F17/30 主分类号 G06F17/22
代理机构 Driggs, Hogg, Daugherty & Del Zoppo Co., LPA 代理人 Daugherty Patrick J.;Driggs, Hogg, Daugherty & Del Zoppo Co., LPA
主权项 1. A computer-implemented method for automated comparison of Darwin Information Typing Architecture (DITA) documents, the method comprising executing on a processor the steps of: reading document data from a first DITA table document into a first document object model tree comprising a plurality of nodes, and from a second DITA table document into a second document object model tree comprising a plurality of nodes; normalizing table attributes of the first document object model tree and the second document object model tree; transforming via preorder traversal the first document object model tree into a first pre-order node list output, and the second document object model tree into a second pre-order node list output; constructing unique table header labels for nodes in the first pre-order node list output, and for nodes in the second pre-order node list output; comparing, via a table-specific fuzzy Longest Common Subsequence (LCS) process, the unique table header labels of the first constructed pre-order node list output to the unique table header labels of the second constructed pre-order node list output, to thereby generate a merged header node list; analyzing column header name text to distinguish between new and modified columns in the first constructed pre-order node list output and in the second constructed pre-order node list output; and generating a first column name map from old column names for the first constructed pre-order node list output to column names in the merged header node list, and a second column name map from old column names for the second constructed pre-order node list output to column names in the merged header node list.
地址 Armonk NY US