摘要 |
A method is provided for identifying compound terms in a document that is represented by a stream of tokens. The stream of document tokens is scanned for an initial term associated with a compound term and a compound term template is accessed when the initial term is identified. The template includes content, retention, and token specifications for the compound term. The stream of tokens is compared with the template, and when the stream matches the content specification of the template, a token representing the compound term is tagged according to the retention specification and added to the stream of tokens. The tagged token is stopped according to the retention specification represented by its tag.
|