发明名称 SIMILAR DATA RETRIEVAL DEVICE AND PROGRAM FOR THE SAME
摘要 PROBLEM TO BE SOLVED: To more properly detect a set of files having similar content. SOLUTION: A position for dividing each file into a predetermined number of constituent segments having equal size is temporarily determined. Data preceding or following the temporary division position are read. A part where a specific pattern is detected is determined as a definite division position, and a hash value of each constituent segment obtained by dividing the file at the division positions is calculated. When similarity is decided between files, an eigenvalue of the constituent segment related to one file and an eigenvalue of the constituent segment related to the other file are sequentially compared for each segment. The number or a ratio of the constituent segments where the eigenvalues match is counted. As the number or the ratio of the constituent segments where the eigenvalues match is larger, a degree of similarity is higher. COPYRIGHT: (C)2011,JPO&INPIT
申请公布号 JP2010256951(A) 申请公布日期 2010.11.11
申请号 JP20090102704 申请日期 2009.04.21
申请人 DATA HENKAN KENKYUSHO:KK 发明人 HATANAKA TOYOJI
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址
您可能感兴趣的专利