Automated data classification,申请号US201314108119-传众专利搜索

发明名称	Automated data classification
摘要	A system and method for data classification are presented. A plurality of training tokens are identified by at least one server communicatively coupled to a network. Each training token includes a token retrieved from a content source and a classification of the token. For each training token in the plurality of training tokens, a plurality of n-gram sequences are identified, a plurality of features for the plurality of n-gram sequences are generated, and first training data is generated using the token retrieved from the content source, the plurality of features, and the classification of the token. A first classifier is trained with the first training data, and the first classifier is stored into a storage system in communication with the at least one server.
申请公布号	US9483740(B1)	申请公布日期	2016.11.01
申请号	US201314108119	申请日期	2013.12.16
申请人	Go Daddy Operating Company, LLC	发明人	Ansel Jason;Marcus Adam;Olszewski Marek;Mierle Keir
分类号	G06F17/00;G06N99/00;G06F17/27;G06F17/30	主分类号	G06F17/00
代理机构	Quarles & Brady LLP	代理人	Quarles & Brady LLP
主权项	1. A method, comprising: identifying, by at least one server communicatively coupled to a network, a plurality of training tokens, each training token including a token retrieved from a content source and a classification of the token; for each training token in the plurality of training tokens: identifying, by the at least one server, a plurality of n-gram sequences,generating, by the at least one server, a plurality of features for the plurality of n-gram sequences, andgenerating, by the at least one server, first training data using the token retrieved from the content source, the plurality of features, and the classification of the token; training a first classifier with the first training data; storing, by the at least one server, the first classifier into a storage system in communication with the at least one server; for each training token in the plurality of training tokens: identifying a plurality of related tokens in the content source,for each of the related tokens in the content source: identifying a second plurality of n-gram sequences, andgenerating a second plurality of features using the second plurality of n-gram sequences and by executing the first classifier on the related token to generate a probable classification of the related token; generating second training data using the second plurality of features; training a second classifier with the second training data; and storing, by the at least one server, the second classifier into the storage system in communication with the at least one server.
地址	Scottsdale AZ US