发明名称 SYSTEMS AND METHODS FOR TOKENIZING AND INTERPRETING UNIFORM RESOURCE LOCATORS
摘要 Aspects include methods, computer readable storing instructions for such methods, and systems for processing text strings such as URLs that comprise patterns of parameters and values for such parameters, delimited in a site-specific manner. Such aspects provide for accepting a number of text strings that are expected to have a common delimiting strategy, then deeply tokenizing those text strings to arrive at a set of tokens from which are selected anchor tokens used to form patterns having the anchor tokens separated by wildcard portions for recursive processing. The patterns formed can be mapped to a tree of nodes. Information concerning relationships between nodes and between tokens within a given node, as well as other heuristics concerning which tokens are parameters and which are values can be used as observed events for producing probabilities that certain tokens are parameters or values, using a dynamic programming algorithm, such as a Viterbi algorithm.
申请公布号 US2009327304(A1) 申请公布日期 2009.12.31
申请号 US20080163898 申请日期 2008.06.27
申请人 YAHOO! INC. 发明人 AGARWAL AMIT JAGDISH;POOLA KRISHNA LEELA
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址