发明名称 KEYWORD EXTRACTION FROM UNIFORM RESOURCE LOCATORS (URLS)
摘要 The keyword extraction technique described herein extracts keywords from Uniform Resource Locators (URLs) in web logs. The technique leverages the content and the structure of URLs to extract relevant keywords. First, a URL is divided into multiple components based on its structure. A set of keywords are extracted from each component of the URL independently with the help of a controlled vocabulary. Then a second set of keywords are generated by forming combinations of terms from different segments of the URL. Only those combinations which are present in the controlled vocabulary are retained as keywords. Finally, the keywords are scored with a function which took into account of a wide set of features.
申请公布号 WO2012125350(A3) 申请公布日期 2012.11.22
申请号 WO2012US27927 申请日期 2012.03.07
申请人 MICROSOFT CORPORATION 发明人 VYSYARAJU, SANTOSH R.;UDUPA, UPPINAKUDURU RAGHAVENDRA;BHOLE, ABHIJIT N.;DASSA, GUY;LIU, WEIGUO;XIAO, QING
分类号 G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项
地址