摘要 |
THE PRESENT INVENTION INVOLVES A SYSTEMAND METHOD THAT FACILITATE EXTRACTING DATA FROM MESSAGES FOR SPAM FILTERING. THE EXTRACTED DATA CAN BE IN THE FORM FEATURES, WHICH CAN BE EMPLOYED IN CONNECTION WITH MACHINE LEARNING SYSTEMS TO BUILD IMPROVED FILTERS. DATA ASSOCIATED WITH ORIGINATION INFORMATION AS WELL AS OTHER INFORMATION EMBEDDED IN THE BODY OF THE MESSAGE THAT ALLOWS A REDIPIENT OF THE MESSAGE TO CONTACT AND/OR RESPOND TO THE SENDER OF THE MESSAGE CAN BE EXTRACTED AS FEATURES. THE FEATTURES, OR A SUBSET THEREOF, CAN BE NORMALIZED AND/OR DEOBFUSCATED PRIOR TO BEING EMPLOYED AS FEATURES OF THE MACHINE LEARNING SYSTEMS. THE (DEOBFUSCATED) FEATURES CAN BE EMPLOYED TO POPULATE A PLURALITY OF FEATURE LISTS THAT FACILITATE SPAM DETECTION AND PREVENTION. EXAMPLARY FEATURES INCLUDE AN EMAIL ADDRESS, AN IP ADDRESS, A URL, AND EMBEDDED IMAGE POINTING TO A URL, AND/OR PORTIONS THEREOF. |