发明名称 |
Header-token driven automatic text segmentation |
摘要 |
A method and a system to automatically segment text based on header tokens is described. A relevance value and an irrelevance value are determined for each token in a description, assuming no tokens are left out of computations. The irrelevance value is based on occurrences of a token in a sample set of descriptions. The relevance value is an estimated probability of relevance based on the header of the description being segmented. |
申请公布号 |
US9053091(B2) |
申请公布日期 |
2015.06.09 |
申请号 |
US201314100990 |
申请日期 |
2013.12.09 |
申请人 |
eBay Inc. |
发明人 |
Sarwar Badrul M.;Mount John A. |
分类号 |
G06F17/30;G06F17/27 |
主分类号 |
G06F17/30 |
代理机构 |
Schwegman Lundberg & Woessner, P.A. |
代理人 |
Schwegman Lundberg & Woessner, P.A. |
主权项 |
1. A method comprising:
assigning a value to a first token in a description, the value indicating either:
that the first token also occurs in a header of the description,that a lexical association exists between the first token and a second token in the header, orthat the lexical association does not exist and the first token is absent from the header; computing a relevance value of a group of tokens that occur in the description and include the first token with the assigned value, the relevance value of the group being computed by a processor of a machine based on the value assigned to the first token; indicating that the group of tokens is a most relevant group of tokens in the description; wherein:
the assigning of the value to the first token includes initially assigning a default value that indicates the lexical association does not exist and the first token is absent from the header; andthe assigning of the value to the first token includes overwriting the initially assigned default value based on the first token occurring in the header. |
地址 |
San Jose CA US |