发明名称 APPARATUS AND METHOD FOR ABSTRACTING MARKUP LANGUAGE DOCUMENTS
摘要 An apparatus and a method to generate a hyperlinked abstract from a markup language document by parsing the document to create a syntax tree, analyzing statistically the syntax tree based on at least one rule, classifying information at each node of the syntax tree, adapting information at each node of the classified tree for outputting and summarizing the adapted tree to create a hyperlinked abstract of the document to be presented at an output device. The abstract can be considered as a summarized version of the document. It occupies less bandwidth than the document, allowing it to be transmitted to a user at a much faster pace, even if the user's computing system and connection are not very sophisticated. Through the abstract, the user can quickly become aware of the coverage of the document. If more detailed information is preferred, the user can access those materials in the document through hyperlinks. In one embodiment, the summarization step includes grouping, in which a predetermined number of nodes are grouped together. In another embodiment, after summarization, the tree can be modified by an output-specific filter, and can be sent to an output device.
申请公布号 WO0042531(A3) 申请公布日期 2000.11.30
申请号 WO2000US00202 申请日期 2000.01.05
申请人 YAHOO, INC. 发明人 BALASUBRAMANIAM, SHANMUGASUNDER;VISHWANATH, MOHAN;MENDHEKAR, ANURAG
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址