发明名称 |
CLUSTERING OF TEXT FOR STRUCTURING OF TEXT DOCUMENTS AND TRAINING OF LANGUAGE MODELS |
摘要 |
The present invention relates to a method, a text segmentation system and a computer program product for clustering of text into text clusters representing a distinct semantic meaning. The text clustering method identifies text portions and assigns text portions to different clusters in such a way that each text cluster refers to one or several semantic topics. The clustering method incorporates an optimization procedure based on a re-clustering procedure evaluating a target function being indicative of the correlation between a text unit and a cluster. The text clustering method makes use of a text emission model and a cluster transition model and makes further use of various smoothing techniques. |
申请公布号 |
WO2005050473(A3) |
申请公布日期 |
2006.07.20 |
申请号 |
WO2004IB52406 |
申请日期 |
2004.11.12 |
申请人 |
PHILIPS INTELLECTUAL PROPERTY & STANDARDS GMBH;KONINKLIJKE PHILIPS ELECTRONICS N. V.;PETERS, JOCHEN |
发明人 |
PETERS, JOCHEN |
分类号 |
G06F17/27;G06F17/30 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|