发明名称 A CONSISTENCY CHECKER FOR DOCUMENTS CONTAINING JAPANESE TEXT
摘要 <p>A Consistency Checker provides an improved method of analyzing a Japanese text document to identify inconsistently spelled words. The Consistency Checker utilizes a Reading Pair Database (RPD) and a Compressed Lexicon Database (CLD) to determine the reading units within a word, to calculate a Reading Pair Identification Number (RID) for each reading unit, to calculate a Sense Identification Number (SID) for each word, and to calculate a Spelling Variant Identification Number (SVID) for each word. Spelling variants are generated by combining variations of individual RIDs in the RID array. A Registry is updated to maintain statistics on all of the words within the document. An error field within the Registry indicates that the document contains more than one spelling variant of a particular word. The client program can access the Registry to alert a user to inconsistencies discovered in the document. The RPD comprises a list of reading pairs correlating Japanese text reading units of one character set with equivalent Japanese text reading units of another character set. Equivalent reading units from each character set are combined to form a reading pair and each reading pair is assigned an RID. A method is provided for generating the RPD by analyzing a list of Japanese words and a list of Japanese word equivalents having different spellings. Reading units are discovered by splitting the words at common dividing points and eliminating low-occurrence reading units until a set of high-occurence reading units is defined.</p>
申请公布号 WO1999067724(A1) 申请公布日期 1999.12.29
申请号 US1999014111 申请日期 1999.06.23
申请人 发明人
分类号 主分类号
代理机构 代理人
主权项
地址