发明名称 Header-token driven automatic text segmentation
摘要 A method and a system to automatically segment text based on header tokens is described. A relevance value and an irrelevance value are determined for each token in a description, assuming no tokens are left out of computations. The irrelevance value is based on occurrences of a token in a sample set of descriptions. The relevance value is an estimated probability of relevance based on the header of the description being segmented.
申请公布号 US9053091(B2) 申请公布日期 2015.06.09
申请号 US201314100990 申请日期 2013.12.09
申请人 eBay Inc. 发明人 Sarwar Badrul M.;Mount John A.
分类号 G06F17/30;G06F17/27 主分类号 G06F17/30
代理机构 Schwegman Lundberg & Woessner, P.A. 代理人 Schwegman Lundberg & Woessner, P.A.
主权项 1. A method comprising: assigning a value to a first token in a description, the value indicating either: that the first token also occurs in a header of the description,that a lexical association exists between the first token and a second token in the header, orthat the lexical association does not exist and the first token is absent from the header; computing a relevance value of a group of tokens that occur in the description and include the first token with the assigned value, the relevance value of the group being computed by a processor of a machine based on the value assigned to the first token; indicating that the group of tokens is a most relevant group of tokens in the description; wherein: the assigning of the value to the first token includes initially assigning a default value that indicates the lexical association does not exist and the first token is absent from the header; andthe assigning of the value to the first token includes overwriting the initially assigned default value based on the first token occurring in the header.
地址 San Jose CA US