摘要 |
A method for efficient and quick string matching is presented. The algorithm gains its efficiency through the assumption that the text to be searched is large and that the pattern searched for is also somewhat large. A preprocessing step is performed on the text and the pattern that consists of finding the locations of matches with a small patch of characters that occurs commonly in both the text and pattern. The distances between successive small patch matching locations (called interdistances) are stored as lists. Based on comparison of the interdistance lists, the probability of match can be calculated. The method is fast because the interdistance lists are much smaller than the text and pattern data and comparing these two smaller lists is significantly faster than comparing the text and pattern data using existing algorithms.
|