发明名称 A VERY FAST APPARATUS AND METHOD FOR DETECTING SIMILAR SECTIONS USING BURROWS-WHEELER TRANSFORM AND FM-INDEX
摘要 The present invention relates to an apparatus and a method to automatically detect similar sections to a small amount of query text to be searched in a large amount of Korean text very fast using Burrows-Wheeler transform and FM-index. The apparatus includes: a text receiver which receives text to be subjected to similarity comparison from a user; a pre-processing module which extracts initial consonants from the text and maps and compresses the initial consonants to one byte to generate a skin file composed of the initial consonants only; a main processing module which generates an index from the skin file generated by the pre-processing module using Burrows-Wheeler transform and FM-index data structure, divides query text into text fragments, and searches for a position of each text fragment in the original text using the index; and a post-processing module which calculates density using position information of the text fragments searched by the main processing module and detects a dense section as a similar section.
申请公布号 KR20140094986(A) 申请公布日期 2014.07.31
申请号 KR20130007725 申请日期 2013.01.23
申请人 PUSAN NATIONAL UNIVERSITY INDUSTRY-UNIVERSITY COOPERATION FOUNDATION 发明人 CHO, HWAN GUE;OCK, CHANG SEOK;PARK, SUN YOUNG
分类号 G06F17/30;G06F17/21;G06F17/27 主分类号 G06F17/30
代理机构 代理人
主权项
地址