发明名称 |
Two-level n-gram index structure and methods of index building, query processing and index derivation |
摘要 |
Disclosed relates to a structure of two-level n-gram inverted index and methods of building the same, processing queries and deriving the index that reduce the size of n-gram inverted index and improves the query performance by eliminating the redundancy of the position information that exists in the n-gram inverted index. The inverted index of the present invention comprises a back-end inverted index using subsequences extracted from documents as a term and a front-end inverted index using n-grams extracted from the subsequences as a term. The back-end inverted index uses the subsequences of a specific length extracted from the documents to be overlapped with each other by n−1 (n: the length of n-gram) as a term and stores position information of the subsequences occurring in the documents in a posting list for the respective subsequences. The front-end inverted index uses the n-grams of a specific length extracted from the subsequences using a 1-sliding technique as a term and stores position information of the n-grams occurring in the subsequences in a posting list for the respective n-grams.
|
申请公布号 |
US7792840(B2) |
申请公布日期 |
2010.09.07 |
申请号 |
US20060501265 |
申请日期 |
2006.08.09 |
申请人 |
KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY |
发明人 |
WHANG KYU-YOUNG;KIM MIN-SOO;LEE JAE-GIL;LEE MIN-JAE |
分类号 |
G06F7/00;G06F17/30 |
主分类号 |
G06F7/00 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|