摘要 |
Disclosed herein is a computer implemented method and system for grouping spend items in a list of said spend items, and for detecting outliers. The spend items entered into the spend database are phonetically sorted and grouped into second level clusters by the spend data clustering engine. In the first level of clustering, first level clusters are created by matching the spend items using generated word tokens and sorted sound codes. The unique spend items, in the list generated after first level clustering, are further matched to create second level clusters. The first level clusters are updated based on the second level of clustering. In order to determine discrepancies in clustering and spend, statistically deviating outliers are detected in each second level cluster. This engine provides clustering at configurable levels of accuracy. The engine's specific combination of word token and sound code matching provides accurate results for spend items.
|