EXTRACTING PRINCIPAL CONTENT FROM WEB PAGES,申请号EP20120847034-传众专利搜索

首页产品黄页商标征信

会员服务注册登录

法人/股东/高管

发明名称	EXTRACTING PRINCIPAL CONTENT FROM WEB PAGES
摘要	<p>Extracting principal content from Web pages includes identifying and classifying items on the Web page, building a list of candidates, calculating candidate scores, selecting a top score candidate, performing clean up processing for the top score candidate, and performing final page processing for the top score candidate. Candidate scores may vary according to a number of paragraphs and images grouped according to size. A word length of CJK (Chinese-Japanese-Korean) text may be determined according to punctuation therein. Candidate scores may be modified according to a number of containers and pieces and wherein a container is a Web page element that is associated with tags‘body’,‘div’,‘td’,‘li’,‘article/section’and pieces are candidates that do not include other candidates. Candidate scores may be modified according to a number of ratios corresponding to text and link density.</p>
申请公布号	EP2776945(A1)	申请公布日期	2014.09.17
申请号	EP20120847034	申请日期	2012.11.07
申请人	EVERNOTE CORPORATION	发明人	BIGNERT, JAKOB;COARNA, GABRIEL, ALEXANDRU
分类号	G06F17/30	主分类号	G06F17/30
代理机构		代理人
主权项
地址

您可能感兴趣的专利

拉杆箱形卷尺

多功能旅游枕垫

标贴(黑白癜风丹)

火警检测仪器箱

自行车座架

包装袋(香蕉雪糕)

Regeneration and transformation of cotton

Lithium secondary battery

Semiconductor power module with connection pins

Photolithographic processing method and apparatus

Object decorating system

Stabilization of thioacetic acid

Mould closing apparatus

Message processing system, image forming apparatus, message processing method and image forming method

Steel mill processing by rhombic reversal reduction rolling

System and method for providing remote automatic speech recognition services via a packet network

Composite ceramic materials for pulverization media and working parts of a pulverizer