摘要 |
Some embodiments of a high-accuracy document information element-vector (IE-vector) encoding server have been presented. In one embodiment, the high-accuracy document IE-vector encoding server applies finite state automaton (FSA) to parse a document to identify one or more information elements (IEs) in the document. Then a DNA sequence of the document is derived based on the one or more IEs. The concept of DNA sequence of a document is powerful and can be used in building automated tools such as computer based processes to automatically reason and search for similarity, dissimilarity, equivalence and other relationships between structured, semi-structured and unstructured data and information. The DNA sequence of a document provides powerful paradigm to build sophisticated information and data search and retrieval techniques and tools.
|