发明名称 |
METHOD FOR SEGMENTING NON-SEGMENTED TEXT USING SYNTACTIC PARSE |
摘要 |
Embodiments of the present invention provide a method and apparatus for segmenting text by providing orthographic (308) and inflectional variations (306) to a syntactic parser (316). Under the present invention, possible segments are first identified in the sequence of characters. At least two of the identified segments overlap each other. For a least one of the segments, an alternative sequence of characters is identified. In some cases, this alternative sequence is formed through inflectional morphology (306), which identifies a different lexical form for a word identified by the segment. In some cases, the alternative sequence represents an orthographic variant (308) of a word identified by the segment. The identified segments and the alternative segments are then passed to a syntactic analyzer (316), which produces one or more syntatic parses. The segments found in the resulting parses represent the segmentation of the input sequence of characters.
|
申请公布号 |
WO0137127(A3) |
申请公布日期 |
2002.03.21 |
申请号 |
WO2000US41750 |
申请日期 |
2000.11.01 |
申请人 |
MICROSOFT CORPORATION |
发明人 |
BROCKETT, CHRISTOPHER, J.;KACMARCIK, GARY, J.;SUZUKI, HISAMI |
分类号 |
G06F17/27;(IPC1-7):G06F17/27 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|