摘要 |
Network pages are identified based on whether the pages include image alternative text that indicates that the network pages contain links to pages that are translations of each other. A plurality of pages and a plurality of respective uniform resource locators are downloaded from a server associated with the domain name of the identified network pages. The uniform resource locators are used to identify a set of candidate parallel page pairs and a set of features are created for each candidate parallel page pair. The sets of features are used to identify parallel page pairs, wherein the pages in a parallel page pair are translations of each other.
|