发明名称 METHOD FOR SEGMENTING NON-SEGMENTED TEXT USING SYNTACTIC PARSE
摘要 Embodiments of the present invention provide a method and apparatus for segmenting text by providing orthographic (308) and inflectional variations (306) to a syntactic parser (316). Under the present invention, possible segments are first identified in the sequence of characters. At least two of the identified segments overlap each other. For a least one of the segments, an alternative sequence of characters is identified. In some cases, this alternative sequence is formed through inflectional morphology (306), which identifies a different lexical form for a word identified by the segment. In some cases, the alternative sequence represents an orthographic variant (308) of a word identified by the segment. The identified segments and the alternative segments are then passed to a syntactic analyzer (316), which produces one or more syntatic parses. The segments found in the resulting parses represent the segmentation of the input sequence of characters.
申请公布号 WO0137127(A3) 申请公布日期 2002.03.21
申请号 WO2000US41750 申请日期 2000.11.01
申请人 MICROSOFT CORPORATION 发明人 BROCKETT, CHRISTOPHER, J.;KACMARCIK, GARY, J.;SUZUKI, HISAMI
分类号 G06F17/27;(IPC1-7):G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项
地址