摘要 |
A system and method for binary classification of text units such as sentences, paragraphs and documents as either a rule of law (ROL) or not a rule of law ( &tilde& ROL) (206). During a training phase (202) of the system and method of the present invention, an initialized knowledge base and labeled or pre-classified sentences are used to build a trained knowledge base. The trained knowledge base contains an equation (404), a threshold (405), and a plurality of statistical values called Z values (502). When inputting text documents for classification, a Z value is generated for each term or token in the input text. The Z values are input to the equation which calculates a score for each sentence. Each calculated score is compared to the threshold to classify each sentence as either ROL or &tilde& ROL.
|
申请人 |
LEXIS-NEXIS;HUMPHREY, TIMOTHY, L.;LU, X., ALLAN;WILTSHIRE, JAMES, S., JR.;MORELOCK, JOHN, T.;COLLIAS, SPIRO, G.;AHMED, SALAHUDDIN |
发明人 |
HUMPHREY, TIMOTHY, L.;LU, X., ALLAN;WILTSHIRE, JAMES, S., JR.;MORELOCK, JOHN, T.;COLLIAS, SPIRO, G.;AHMED, SALAHUDDIN |