发明名称 MULTI-LINGUAL DOCUMENT PROCESSING DEVICE AND METHOD
摘要 PROBLEM TO BE SOLVED: To improve accuracy in generating a set of similar document pairs with a sufficient size in booth strap processing. SOLUTION: An initial corpus holding means 1 holds a set of document pairs composed of a document written in a first language and a document written in a second language which have bilingual relationship. A universal corpus holding means 2 holds a set of documents written in the first language and a set of document written in the second language. A statistical processing means 3 quantifies the similarity between the document written in the first language and the document written in the second language according to the occurrence frequency information of words in the document pair held in the initial corpus holding means and the analysis result of a syntax meaning analyzing means 4. While the document pair determined by the statistical processing means 3 is stored in a corpus holding means 5 and added to the initial corpus holding means 1, statistical processing is repeated.
申请公布号 JP2003141109(A) 申请公布日期 2003.05.16
申请号 JP20010342193 申请日期 2001.11.07
申请人 FUJI XEROX CO LTD 发明人 MASUICHI HIROSHI
分类号 G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项
地址