发明名称 |
Lattice and method for identifying and normalizing orthographic variations in Japanese text |
摘要 |
A lattice data structure suitable for storage on a computer-readable medium is provided which represents a plurality of orthographic forms of a Japanese lexical entry. The lattice includes a plurality of data fields each adapted to hold data representing a word element of the entry. Each data field includes a first subfield containing data representing a primary form of the corresponding word element and a second field containing data representing an alternate form of the corresponding word element. Also provided is a method of normalizing Japanese lexical entries to produce a normalized form that includes the primary form of each word-element representation of the lattice and does not include the alternate forms. Also provided are methods of segmenting text using the disclosed lattice.
|
申请公布号 |
US6731802(B1) |
申请公布日期 |
2004.05.04 |
申请号 |
US20000563636 |
申请日期 |
2000.05.02 |
申请人 |
MICROSOFT CORPORATION |
发明人 |
KACMARCIK GARY;BROCKETT CHRISTOPHER J. |
分类号 |
G06F17/27;(IPC1-7):G06K9/18;G06K9/72;G06F7/00 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|