发明名称
摘要 PROBLEM TO BE SOLVED: To appropriately classify extracted optional character strings into technical expressions used in a specified field and general expressions used regardless of the field by considering the importance within a document of the character string and the importance in the entire plural document files and deciding the importance of the character string. SOLUTION: A character string extraction part 21 reads the document file 31 and extracts the character strings (n-gram character string) of all the length whose head is all characters included in an optional text document included in the document file 31. An importance calculation part 22 calculates the importance within the document and between the documents of the n-gram character string extracted in the character string extraction part 21 and obtains the final importance of the character string for which the character string is weighted from the two pieces of the importance. A character string classification part 23 performs classification into the technical expressions utilized in the specified field and the general expressions often utilized in a normal document regardless of the field based on the importance imparted to the respective extracted character strings in the importance calculation part 22.
申请公布号 JP3609252(B2) 申请公布日期 2005.01.12
申请号 JP19980073920 申请日期 1998.03.23
申请人 发明人
分类号 G06F17/27;G06F17/30 主分类号 G06F17/27
代理机构 代理人
主权项
地址