发明名称 TEXT SEGMENTATION AND TOPIC ANNOTATION FOR DOCUMENT STRUCTURING
摘要 The invention relates to a method, a computer program product and a computer system for structuring an unstructured text by making use of statistical models trained on annotated training data. Each section of text in which the text is segmented is further assigned to a topic which is associated to a set of labels. The statistical models for the segmentation of the text and for the assignment of a topic and its associated labels to a section of text explicitly accounts for: correlations between a section of text and a topic, a topic transition between sections, a topic position within the document and a (topic-dependent) section length. Hence structural information of the training data is exploited in order to perform segmentation and annotation of unknown text.
申请公布号 WO2005050472(A3) 申请公布日期 2006.07.20
申请号 WO2004IB52404 申请日期 2004.11.12
申请人 PHILIPS INTELLECTUAL PROPERTY & STANDARDS GMBH;KONINKLIJKE PHILIPS ELECTRONICS N. V.;PETERS, JOCHEN;MEYER, CARSTEN;KLAKOW, DIETRICH;MATUSOV, EVGENY 发明人 PETERS, JOCHEN;MEYER, CARSTEN;KLAKOW, DIETRICH;MATUSOV, EVGENY
分类号 G06F17/27;G06F17/30 主分类号 G06F17/27
代理机构 代理人
主权项
地址