发明名称 DOCUMENT CLASSIFICATION USING MULTISCALE TEXT FINGERPRINTS
摘要 Described systems and methods allow a classification of electronic documents such as email messages and HTML documents, according to a document-specific text fingerprint. The text fingerprint is calculated for a text block of each target document, and comprises a sequence of characters determined according to a plurality of text tokens of the respective text block. In some embodiments, the length of the text fingerprint is forced within a pre-determined range of lengths (e.g. between 129 and 256 characters) irrespective of the length of the text block, by zooming in for short text blocks, and zooming out for long ones. Classification may include, for instance, determining whether an electronic document represents unsolicited communication (spam) or online fraud such as phishing.
申请公布号 HK1213705(A1) 申请公布日期 2016.07.08
申请号 HK20160101454 申请日期 2016.02.05
申请人 BITDEFENDER IPR MANAGEMENT LTD 发明人 TOMA, Adrian;TIBEIC , Marius, Nicolae
分类号 H04L 主分类号 H04L
代理机构 代理人
主权项
地址