发明名称 |
Method and apparatus for providing improved HMM POS tagger for multi-word entries and factoids |
摘要 |
A method of calculating trigram path probabilities for an input string of text containing a multi-word-entry (MWE) or a factoid includes tokenizing the input string to create a plurality of parse leaf units (PLUs). A PosColumn is constructed for each word, MWE, factoid and character in the input string of text which has a unique first (Ft) and last (Lt) token pair. TrigramColumns are constructed which define corresponding TrigramNodes each representing a trigram for three PosColumns. Forward and backward trigram path probabilities are calculated for each separate TrigramNode. The sums of all trigram path probabilities through each PLU are then calculated as a function of the forward and backward trigram path probabilities. Systems and computer-readable medium configured to implement the methods are also provided.
|
申请公布号 |
US2005234717(A1) |
申请公布日期 |
2005.10.20 |
申请号 |
US20050151953 |
申请日期 |
2005.06.14 |
申请人 |
MICROSOFT CORPORATION |
发明人 |
WEISE DAVID N.;BALA ARAVIND |
分类号 |
G06F17/27;(IPC1-7):G06F17/27 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|