METHOD AND APPARATUS FOR REPRESENTATION OF UNSTRUCTURED DATA
摘要
Method and apparatus providing a binary representation of a document storing unstructured data. A unique word identifier is obtained for each word included in the document. A word select vector includes positions identified by different word identifiers. A 1-bit value is stored at positions identified by the word identifiers of the words included in the document. A unique position identifier is further assigned to each word appearing in the document. A word use set includes vectors for each unique word identifier for which a 1-bit is stored in the word select vector. Each vector in the word use set indicates the position identifiers of the instances of a particular word included in the document. Once the binary representation is generated, it may be efficiently searched to determine whether particular words appear in the document.
申请公布号
WO2007008871(A2)
申请公布日期
2007.01.18
申请号
WO2006US26845
申请日期
2006.07.11
申请人
SAND TECHNOLOGY SYSTEMS INTERNATIONAL, INC.;MCCOOL, MICHAEL;WALD, LINDA, ANN