发明名称 Aligning source texts of different natural languages to produce or add to an aligned corpus
摘要 A plurality of source text files are read, representing similar information but in different natural languages. The files have correlated layouts, in that the same layout commands are employed at similar points in the files. Similar text, from the respective files, is aligned by identifying its position between equivalent word processing commands. Preferably, intermediate files are produced in which the word processing (WP) commands are converted into an identifiable form. Aligned text, which differs between the intermediate files whereas WP commands will not differ, is identified by a differential comparison operation, such as a call to DIFF within a UNIX environment.
申请公布号 US5893134(A) 申请公布日期 1999.04.06
申请号 US19960650967 申请日期 1996.05.21
申请人 CANON EUROPA N.V.;CANON RESEARCH CENTRE EUROPE LIMITED 发明人 O'DONOGHUE, TIMOTHY FRANCIS;WACHTEL, THOMAS JULIUSZ
分类号 G06F17/21;G06F17/27;G06F17/28;(IPC1-7):G06F17/28 主分类号 G06F17/21
代理机构 代理人
主权项
地址