发明名称 |
A VERY FAST APPARATUS AND METHOD FOR DETECTING SIMILAR SECTIONS USING BURROWS-WHEELER TRANSFORM AND FM-INDEX |
摘要 |
The present invention relates to an apparatus and a method to automatically detect similar sections to a small amount of query text to be searched in a large amount of Korean text very fast using Burrows-Wheeler transform and FM-index. The apparatus includes: a text receiver which receives text to be subjected to similarity comparison from a user; a pre-processing module which extracts initial consonants from the text and maps and compresses the initial consonants to one byte to generate a skin file composed of the initial consonants only; a main processing module which generates an index from the skin file generated by the pre-processing module using Burrows-Wheeler transform and FM-index data structure, divides query text into text fragments, and searches for a position of each text fragment in the original text using the index; and a post-processing module which calculates density using position information of the text fragments searched by the main processing module and detects a dense section as a similar section. |
申请公布号 |
KR20140094986(A) |
申请公布日期 |
2014.07.31 |
申请号 |
KR20130007725 |
申请日期 |
2013.01.23 |
申请人 |
PUSAN NATIONAL UNIVERSITY INDUSTRY-UNIVERSITY COOPERATION FOUNDATION |
发明人 |
CHO, HWAN GUE;OCK, CHANG SEOK;PARK, SUN YOUNG |
分类号 |
G06F17/30;G06F17/21;G06F17/27 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|