发明名称 Electronic document encoding
摘要 In general, the subject matter described in this disclosure can be embodied in methods, systems and program products. An input document is received. A computing system determines whether a first portion of text is listed in a table of textual content. A computing system inserts into an output document, as a result of determining that the first portion of text is not listed in the table of textual content, the first portion of text. A computing system adds the first portion of text into the table of textual content. A computing system determines whether a second portion of text is listed in the table of textual content, wherein the second portion of text matches the first portion of text. A computing system inserts a reference to the first portion of text from the table of textual content into the output document. A computing system stores the output document.
申请公布号 US8880538(B1) 申请公布日期 2014.11.04
申请号 US201213369028 申请日期 2012.02.08
申请人 Google Inc. 发明人 Petersson Daniel
分类号 G06F17/30 主分类号 G06F17/30
代理机构 McDonnell Boehnen Hulbert & Berghoff LLP 代理人 McDonnell Boehnen Hulbert & Berghoff LLP
主权项 1. A computer-implemented method for encoding a document, the method comprising: receiving an input document at a computing system that includes a plurality of data elements that identify semantics of respective data values in the input document, wherein the respective data values correspond to respective textual content in the input document; determining, by the computing system, whether a name of a first data element in the input document is listed in a table of data element names; generating, based on the input document, an output document that includes, as a result of determining that the name of the first data element is listed in the table of data element names, a reference from the table of data element names that corresponds to the name of the first data element, wherein the reference identifies the first data element with less characters than the name of the first data element; determining, by the computing system, whether a first data value that corresponds to textual content of the first data element in the input document is listed in a table of textual content; inserting, by the computing system, the first data value into the output document as a result of determining that the first data value is not listed in the table of textual content; updating, by the computing system, the table of textual content to include the first data value as a result of determining that the first data value is not listed in the table of textual content; determining, by the computing system, whether a name of a second data element in the input document is listed in the table of data element names; inserting, by the computing system, the name of the second data element into the output document, as a result of determining that the name of the second data element is not listed in the table of data element names; determining, by the computing system, whether a second data value that corresponds to textual content of the second data element in the input document is listed in the updated table of textual content, wherein the textual content of the second data element matches the textual content of the first data element; inserting, by the computing system, into the output document, as a result of determining that the second data value is listed in the updated table of textual content, a reference to the first data value from the updated table of textual content; and storing, by the computing system, the output document having (i) the reference to the name of the first data element from the table of data element names, (ii) the first data value, (iii) the name of the second data element, and (iv) the reference to the first data value from the updated table of textual content.
地址 Mountain View CA US