摘要 |
A grammar coverage tool operates on sets of sentences which can be expressed as ordered sequences of word class tags. Previously, each sentence has been marked as either a positive or negative example with respect to some previously determined preference criterion. The entire set is divided into 2 parts: a training set, and a test set. The training set is re-written as set of binary vectors, where each tag which occurs in the training set is an attribute. The binary vectors are then submitted to an algorithm which will induce a decision tree which classifies the training set into positive and negative examples. A comparison is then made with the original and authentic classification for the test set. Thus, the % error rate can be calculated. If the error rate is too high, the process can be re-run with a larger volume of training data, until an acceptably low error rate is achieved. The induction of decision trees may use either the ID3 algorithm, or the C4.5 algorithm. |