摘要 |
A method and system for detecting plagiarism of software source code. In one embodiment, a first set of arrays and a second set of arrays are created for a first program source code file and a second program source code file respectively. Each pair of arrays in the first and second sets has entries corresponding to program elements of a distinct program element type such as functional program code, program comments, or program code identifiers. Next, each pair of arrays from the first and second sets is compared to find similar entries, and an intermediate match score is calculated for each pair of arrays based on the similar entries. Further, the resulting intermediate match scores are combined to produce a combined match score, which is then used to provide an indication of copying with respect to the first program source code file and the second program source code file.
|