发明名称 APPARATUS AND METHOD FOR TEXT SEGMENTATION BASED ON COHERENT UNITS
摘要 <p>The invention provides a text segmentation apparatus comprising means for analyzing an electronic text to determine likelihood of segmentation point for each of sentence ends in the text based on a coherent unit and means for segmenting the text into text segments based on the likelihood of segmentation point. The apparatus is programmed to segment the text segment at the position having the best likelihood of segmentation point within the text segment when the size of any of the segmented text segments exceeds a threshold value to be determined based on the specified text segmentation size. Particularly, the apparatus determines the similarity between the text parts contained in a pair of windows to be set up on the left and right sides of each sentence end position in the text so as to obtain similarity curves. Then, the apparatus determines the likelihood of segmentation point for each sentence end point based on the obtained similarity curves. The apparatus segments the text at the point having the best likelihood of segmentation point and further segments it at the point of the second best likelihood of segmentation point, and so on, until the size of all of the text segments becomes approximately equal to the specified segment size.</p>
申请公布号 EP1301853(B1) 申请公布日期 2009.07.22
申请号 EP20010975645 申请日期 2001.10.02
申请人 HEWLETT-PACKARD COMPANY 发明人 SHIMIZU, HIROYUKI;NAKAGAWA, SHINYA
分类号 G06F17/21;G06F17/22;G06F7/60;G06F12/00;G06F15/00;G06F17/00;G06F17/10;G06F17/24;G06F17/27;G06K9/00 主分类号 G06F17/21
代理机构 代理人
主权项
地址