主权项 |
1. A computer-implemented method of identifying documents sharing at least one common underlying structure, comprising:
detecting, by at least one computer, occurrences of a plurality of predetermined image features in a plurality of document images, wherein at least one of the plurality of predetermined image features is common among instances of a form; indexing, by the at least one computer, the plurality of document images in an image index based on the detected image features; building, by the at least one computer, a graph of connected nodes for the plurality of document images by searching the image index, wherein nodes representing instances of a predefined document type are connected by edges in the graph; and identifying, by the at least one computer, the documents sharing common underlying structures using the graph.
|