发明名称 Document text processing using edge detection
摘要 A document is received that has a plurality of lines with text. This document includes text associated with at least one topic of interest and text not associated with the at least one topic of interest. Thereafter, it is determined, for each line in the document, a length of the line and a number of off-topic indicators with the off-topic indicators characterizing portions of the document as likely being not being associated with the at least one topic of interest. Thereafter, a density for each line can be determined based on the determined line length and the determined number of off-topic indicators. The determined densities for each line are used to identify portions of the documents likely associated with the at least one topic of interest so that data characterizing the identified portions of the document can be provided. Related apparatus, systems, techniques and articles are also described.
申请公布号 EP2662779(A3) 申请公布日期 2017.01.04
申请号 EP20130002087 申请日期 2013.04.22
申请人 SAP SE 发明人 Botros, Sherif;Herman, David;Shami, Mohammad
分类号 G06F17/22 主分类号 G06F17/22
代理机构 代理人
主权项
地址