发明名称 ORGANIZATIONAL URL ENRICHMENT
摘要 In an example embodiment, an organization name is retrieved from an organization name field of a first record. A web search is performed using the organization name, producing web search results. A second plurality of features is extracted for each web search result in the set of web search results. Each of the extracted second plurality of features for each web search result in the set of web search results is input into a supervised machine learning classifier to classify each of the web search results in the set of web search results as either containing an organization web address or not containing an organization web address. In response to a determination by the supervised machine learning classifier that a first web search result contains an organization web address, the organization web address from the first web search result is injected into an organization web address field of the first record.
申请公布号 US2017091270(A1) 申请公布日期 2017.03.30
申请号 US201514929109 申请日期 2015.10.30
申请人 Linkedln Corporation 发明人 Guo Songtao;Degiere Christopher Matthew;Kumar Aarti;Lai Alex Ching;Li Xian
分类号 G06F17/30;G06N99/00 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method for enrichment a web address field of a first record in a computer system, the method comprising: obtaining a set of sample search results; extracting a first plurality of features for each search result in the set of sample search results; labeling at least one of the sample search results in the set of sample search results as containing an organization web address associated with an organization name; feeding the labeled set of sample web search results and the first set of extracted features into a supervised machine learning classifier to train the supervised machine learning classifier to recognize when a search result contains an organization web address based on the first plurality of features; retrieving an organization name from an organization name field of the first record; performing a web search using the organization name, producing a set of web search results; extracting a second plurality of features for each web search result in the set of web search results; inputting each of the extracted second plurality of features for each web search result in the set of web search results into the supervised machine learning classifier to classify each of the web search results in the set of web search results as either containing an organization web address or not containing an organization web address; and in response to a determination by the supervised machine learning classifier that a first web search result contains an organization web address, injecting the organization web address from the first web search result into an organization web address field of the first record.
地址 Mountain View CA US