发明名称 Method and system for selecting documents by measuring document quality
摘要 The present invention relates to a system and method for classifying documents in order to select the most desirable documents of a group. Because quality is very difficult to distinguish by anyone other than a human being, this invention provides a system and method that will create a profile of what constitutes quality, then utilize this profile to allow a user to retrieve information that is desirable. A client is provided with items of data selected according to estimates computed using a profile of certain high-level criteria such as quality, interestingness, appropriateness, timeliness, humor, style of language, obscenity, sentiment, and any combinations thereof. These estimates are computed from low-level criteria such as length, vocabulary, fraction of words spelled correctly, title, author, reading grade level, average length of sentences, average length of words, usage of punctuation, usage of grammar, formatting, capitalization, source, display tags and any combinations thereof. The profile is learned automatically from labeled training examples. This system also relates to a method of obtaining and automatically associating a value to an item of data by obtaining items, obtaining labels for some items, selecting items of data with certain labels to form training sets, learning a profile using the training sets, and associating a value to another item of data using said profile. As such, the program is capable of learning to measure which items are of high quality and is capable of delivering only those items of data which would be of interest to a client.
申请公布号 US2002055940(A1) 申请公布日期 2002.05.09
申请号 US20010004514 申请日期 2001.11.02
申请人 ELKAN CHARLES 发明人 ELKAN CHARLES
分类号 G06F17/30;(IPC1-7):G06F7/00 主分类号 G06F17/30
代理机构 代理人
主权项
地址