发明名称 Parsing of text using linguistic and non-linguistic list properties
摘要 A system and method are disclosed for extracting information from text which can be performed without prior knowledge as to whether the text includes a list. The method applies parser rules to a sentence spanning lines of text to identify a set of candidate list items in the sentence. Each candidate list item is assigned a set of features including one or more non-linguistic feature and a linguistic feature. The linguistic feature defines a syntactic function of an element of the candidate list item that is able to be in a dependency relation with an element of an identified candidate list introducer in the same sentence. When two or more candidate list items are found with compatible sets of features, a list is generated which links these as list items of a common list introducer. Dependency relations are extracted between the list introducer and list items and information based on the extracted dependency relations is output.
申请公布号 US2012290288(A1) 申请公布日期 2012.11.15
申请号 US201113103263 申请日期 2011.05.09
申请人 AIT-MOKHTAR SALAH;XEROX CORPORATION 发明人 AIT-MOKHTAR SALAH
分类号 G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项
地址