发明名称 METHOD AND APPARATUS FOR REPRESENTATION OF UNSTRUCTURED DATA
摘要 Method and apparatus providing a binary representation of a document storing unstructured data. A unique word identifier is obtained for each word included in the document. A word select vector includes positions identified by different word identifiers. A 1-bit value is stored at positions identified by the word identifiers of the words included in the document. A unique position identifier is further assigned to each word appearing in the document. A word use set includes vectors for each unique word identifier for which a 1-bit is stored in the word select vector. Each vector in the word use set indicates the position identifiers of the instances of a particular word included in the document. Once the binary representation is generated, it may be efficiently searched to determine whether particular words appear in the document.
申请公布号 WO2007008871(A2) 申请公布日期 2007.01.18
申请号 WO2006US26845 申请日期 2006.07.11
申请人 SAND TECHNOLOGY SYSTEMS INTERNATIONAL, INC.;MCCOOL, MICHAEL;WALD, LINDA, ANN 发明人 MCCOOL, MICHAEL;WALD, LINDA, ANN
分类号 G06F7/00 主分类号 G06F7/00
代理机构 代理人
主权项
地址