发明名称 AUTOMATIC EXTRACTION USING MACHINE LEARNING BASED ROBUST STRUCTURAL EXTRACTORS
摘要 A method and apparatus for automatically extracting information from a large number of documents through applying machine learning techniques and exploiting structural similarities among documents. A machine learning model is trained to have at least 50% accuracy. The trained machine learning model is used to identify information attributes in a sample of pages from a cluster of structurally similar documents. A structure-specific model of the cluster is created by compiling a list of top-K locations for each attribute identified by the trained machine learning model in the sample. These top-K lists are used to extract information from the pages of the cluster from which the sample of pages was taken.
申请公布号 US2010223214(A1) 申请公布日期 2010.09.02
申请号 US20090395586 申请日期 2009.02.27
申请人 KIRPAL ALOK S;SATPAL SANDEEPKUMAR BHURAMAL;KSHIRSAGAR MEGHANA;SENGAMEDU SRINIVASAN H 发明人 KIRPAL ALOK S.;SATPAL SANDEEPKUMAR BHURAMAL;KSHIRSAGAR MEGHANA;SENGAMEDU SRINIVASAN H.
分类号 G06F15/18 主分类号 G06F15/18
代理机构 代理人
主权项
地址
您可能感兴趣的专利