发明名称 Aligning hierarchal and sequential document trees to identify parallel data
摘要 A set of candidate parallel pages is identified based on trigger words in one or more pages downloaded from a given network location (such as a website). A set of document trees representing each of the candidate pages are aligned to identify translationally parallel content and hyperlinks. The parallel content is further fed into conventional sentence aligner for parallel sentences. And the parallel hyperlinks usually refer to other parallel documents, and lead to a recursive mining of parallel documents.
申请公布号 US2008010056(A1) 申请公布日期 2008.01.10
申请号 US20060483941 申请日期 2006.07.10
申请人 MICROSOFT CORPORATION 发明人 ZHOU MING;NIU CHENG;SHI LEI
分类号 G06F17/20 主分类号 G06F17/20
代理机构 代理人
主权项
地址
您可能感兴趣的专利