发明名称 Method and apparatus for providing improved HMM POS tagger for multi-word entries and factoids
摘要 A method of calculating trigram path probabilities for an input string of text containing a multi-word-entry (MWE) or a factoid includes tokenizing the input string to create a plurality of parse leaf units (PLUs). A PosColumn is constructed for each word, MWE, factoid and character in the input string of text which has a unique first (Ft) and last (Lt) token pair. TrigramColumns are constructed which define corresponding TrigramNodes each representing a trigram for three PosColumns. Forward and backward trigram path probabilities are calculated for each separate TrigramNode. The sums of all trigram path probabilities through each PLU are then calculated as a function of the forward and backward trigram path probabilities. Systems and computer-readable medium configured to implement the methods are also provided.
申请公布号 US2005234717(A1) 申请公布日期 2005.10.20
申请号 US20050151953 申请日期 2005.06.14
申请人 MICROSOFT CORPORATION 发明人 WEISE DAVID N.;BALA ARAVIND
分类号 G06F17/27;(IPC1-7):G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项
地址
您可能感兴趣的专利