发明名称 Method, apparatus, and computer program product for classification of documents
摘要 Provided herein are systems, methods and computer readable media for classification of documents using a location hierarchy. An example method may include receiving a feature vector r that represents occurrence counts of references in a document's text to each of a group of named entities, and determining whether the document is associated with the particular location by querying, to determine a query result, using feature vector r, at least one location-specific classifier from a group of location-specific classifiers, wherein the location-specific classifier is associated with the particular location, and wherein the location-specific classifier is configured to generate a positive output value in response to receiving an input feature vector representing occurrence count of at least one reference to the particular named entity and determining that the document is associated with the particular location in an instance in which the query result includes data indicating that the positive output value was generated by the location-specific classifier that is associated with the particular location.
申请公布号 US9589184(B1) 申请公布日期 2017.03.07
申请号 US201313969008 申请日期 2013.08.16
申请人 Groupon, Inc. 发明人 Castillo Roger Henry;Humphrey Brian Andrew
分类号 G06K9/00 主分类号 G06K9/00
代理机构 Alston & Bird LLP 代理人 Alston & Bird LLP
主权项 1. A computer-implemented method, comprising: receiving a feature vector r that represents occurrence counts of references in a document's text to each of a group of named entities, wherein a particular named entity within the group of named entities is associated with a particular location; and determining, by a processor, whether the document is associated with the particular location by performing operations comprising: querying, to determine a query result, using feature vector r, at least one location-specific classifier from a group of location-specific classifiers, wherein the location-specific classifier is associated with the particular location, and wherein the location-specific classifier is configured to generate a positive output value in response to receiving an input feature vector representing occurrence count of at least one reference to the particular named entity; anddetermining that the document is associated with the particular location in an instance in which the query result includes data indicating that the positive output value was generated by the location-specific classifier that is associated with the particular location, wherein feature vector r is a vector of values with each element of feature vector r being an occurrence count of references within the document's text to one of the group of named entities, each element having an index position, wherein generating the feature vector r comprises: for each named entity in the group of named entities, generating a reference bit vector for each reference within the document's text to the named entity; and calculating a sum vector from the generated reference bit vectors.
地址 Chicago IL US