发明名称 Systems and methods for processing documents of unknown or unspecified format
摘要 Described herein are systems and methods for processing documents of unknown or unspecified format. Embodiments include methods (such as computer implemented methods), computer programs configured to perform such methods, carrier media embodying code for allowing a computer system to perform such methods, and computer systems configured to perform such methods. According to one embodiment, the method includes extracting raw encoded text from a document, and applying a process thereby to identify markers/delimiters (for example the beginnings and ends of sections), apply decompression (where necessary), and identify a most likely character encoding protocol. This allows for conversion of the raw encoded text into meaningful text. Document Stream Input - Chunk Identification Phase Decompression Phase 4, Encoding Determination Phase Output Phase
申请公布号 AU2012201539(B2) 申请公布日期 2016.06.16
申请号 AU20120201539 申请日期 2012.03.16
申请人 ISYS SEARCH SOFTWARE PTY LTD 发明人 MURPHY, DEREK;TRUSCOTT, BEN;DAVIES, IAN;COLES, SCOTT
分类号 G06F17/21 主分类号 G06F17/21
代理机构 代理人
主权项
地址