发明名称 Method and apparatus using source-channel models for word segmentation
摘要 A method and apparatus for segmenting text is provided that identifies a sequence of entity types from a sequence of characters and thereby identifies a segmentation for the sequence of characters. Under the invention, the sequence of entity types is identified using probabilistic models that describe the likelihood of a sequence of entities and the likelihood of sequences of characters given particular entities. Under one aspect of the invention, organization name entities are identified from a first sequence of identified entities to form a final sequence of identified entities.
申请公布号 US2004243408(A1) 申请公布日期 2004.12.02
申请号 US20030448644 申请日期 2003.05.30
申请人 MICROSOFT CORPORATION 发明人 GAO JIANFENG;LI MU;HUANG CHANG-NING;SUN JIAN;ZHANG LEI;ZHOU MING
分类号 G06F17/27;(IPC1-7):G10L15/12 主分类号 G06F17/27
代理机构 代理人
主权项
地址