摘要 |
An improved system and method is provided for hierarchical segmentation of websites by topic. To do so, an organization of topics may be determined within directories of a website, the hierarchical arrangement of the web pages in the website may be segmented by topic, and the segments representing regions of coherent topics in the website directory may be output. In an embodiment, a website directory may be converted into a binary tree and dynamic programming may be applied to iteratively determine whether to add a node of the tree to a segment representing a topic. A node selection cost may be evaluated to determine whether to add a node of the tree as a segment representing a topic. And a cohesiveness cost may be evaluated to determine how well a web page of the tree may be represented by its closest ancestral node that may be a segmentation point of a segment representing a topic.
|