发明名称 Extracting information from formatted sources
摘要 An extraction manager extracts information from formatted input. The input is annotated with presentation information, and parsed into a set of elements comprising a canonical representation thereof. An information analyzer analyzes the elements in order to glean additional information. An entity extractor determines entities to extract from the input. The entity extractor analyzes elements according to specific entities to be extracted, and creates entity specific observations for analyzed elements. These observations comprise possible values for the relevant entities. A heuristics processor maintains a collection of entity specific heuristics, each comprising a test to help determine the suitability of data as a value for the corresponding entity. The heuristics processor selects heuristics for the entities to be extracted, and tests observations for these entities against the selected heuristics. Responsive to this testing, ordered possible values for entities to extract are determined.
申请公布号 US7630968(B2) 申请公布日期 2009.12.08
申请号 US20060357656 申请日期 2006.02.16
申请人 KABOODLE, INC. 发明人 MCCAMMON KEIRON;CHANDRA MANISH
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址