摘要 |
An automated technique compares two sets of documents (such as two source codebases) to automatically determine documents within each set that are similar to one another. The technique constructs a matrix relating pairs of documents from the first and second sets of documents to lines that occur in both documents in each of the pairs of documents. A similarity score is calculated for each of the pairs of documents based on the lines from the matrix. |