发明名称 Compound word breaker and spell checker
摘要 A method of determining the component words of a compound word is disclosed. The method identifies the component words, by comparing the word with a list of words found in a lexicon. If the word is not found in the lexicon the method proceeds to analyze the word on a character-by-character basis. After each character the method identifies any potential matches to the selected characters in the lexicon. If a match is found, it is added to a hypothesis trace in a lattice. Next, the method checks to see whether the remaining characters form a valid entry in the lexicon, and whether the entry is an allowed to be a final segment: All encountered component words are entered into the lattice, thus creating possibly more than one hypothesis path. Some paths may be rendered invalid, if they don't contain the required "seg1" annotation for non-final segments or had encountered an "anti-seg" bit for presumed final segment. The output can be ranked if more than one valid segmentation is found. The method can also correct spelling errors due to incorrect compounding.
申请公布号 US2005091030(A1) 申请公布日期 2005.04.28
申请号 US20040804930 申请日期 2004.03.19
申请人 MICROSOFT CORPORATION 发明人 JESSEE ANDREA M.;ECKERT MIRIAM R.;POWELL KEVIN R.
分类号 G06F17/26;G06F17/27;G10L15/18;(IPC1-7):G06F17/28 主分类号 G06F17/26
代理机构 代理人
主权项
地址