发明名称 Systems, methods, and computer readable media for extracting data from portable document format (PDF) files
摘要 According to one method, the method occurs at a data file analyzer. The method includes identifying at least one document identifier associated with a first document in a portable document format (PDF) file. The method further includes determining, using the at least one document identifier, a reference point identifier for identifying a reference point in the first document, an offset value for indicating a location of a first detection area in the first document, and size information for indicating a size of the first detection area in the first document. The method also includes identifying, using a reference point identifier, the reference point in the first document. The method further includes identifying, using the offset value and the size information, the first detection area in the first document and extracting, by processing binary data of the PDF file, data within the first detection area of the first document.
申请公布号 US9418315(B1) 申请公布日期 2016.08.16
申请号 US201615069913 申请日期 2016.03.14
申请人 Sageworks, Inc. 发明人 Keogh Timothy Francis;Hamilton Brian
分类号 G06K9/00;G06K9/62;G06F17/30 主分类号 G06K9/00
代理机构 Jenkins, Wilson, Taylor & Hunt, P.A. 代理人 Jenkins, Wilson, Taylor & Hunt, P.A.
主权项 1. A method for extracting data from a portable document format (PDF) file, the method comprising: identifying at least one document identifier associated with a first document in a portable document format (PDF) file; determining, using the at least one document identifier, a reference point identifier in the first document, an offset value for indicating a location of a first detection area in the first document, and size information for indicating a size of the first detection area in the first document, identifying, using the reference point identifier, the reference point in the first document; identifying, using the offset value and the size information, the first detection area in the first document; and extracting, by processing binary data of the PDF file, data within the first detection area of the first document.
地址 Raleigh NC US