发明名称 |
Method and apparatus using source-channel models for word segmentation |
摘要 |
A method and apparatus for segmenting text is provided that identifies a sequence of entity types from a sequence of characters and thereby identifies a segmentation for the sequence of characters. Under the invention, the sequence of entity types is identified using probabilistic models that describe the likelihood of a sequence of entities and the likelihood of sequences of characters given particular entities. Under one aspect of the invention, organization name entities are identified from a first sequence of identified entities to form a final sequence of identified entities.
|
申请公布号 |
US2004243408(A1) |
申请公布日期 |
2004.12.02 |
申请号 |
US20030448644 |
申请日期 |
2003.05.30 |
申请人 |
MICROSOFT CORPORATION |
发明人 |
GAO JIANFENG;LI MU;HUANG CHANG-NING;SUN JIAN;ZHANG LEI;ZHOU MING |
分类号 |
G06F17/27;(IPC1-7):G10L15/12 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|