发明名称 Computer implemented method for the automatic classification of instrumental citations
摘要 The learning method taught in this patent document is significantly different from previous methods for automatic classification of citations that are labor intensive and subject to human bias and error. The present invention automatically generates and avoids these limitations. A set of operational definitions and features uniquely suited to the scientific literature is disclosed along with their use with a learning method that is capable of analyzing the textual content of articles along with bibliometric data to accurately classify instrumental citations.
申请公布号 US8832002(B2) 申请公布日期 2014.09.09
申请号 US200912614320 申请日期 2009.11.06
申请人 发明人 Fu Lawrence;Aliferis Konstantinos (Constantin) F.
分类号 G06F15/18 主分类号 G06F15/18
代理机构 代理人 Weinberger Laurence A.
主权项 1. A computer implemented method for automatically classifying citations in a document database, comprising the following steps: A. identifying influential features for the citations in the database comprising the following steps: 1) selecting appropriate input features for training;2) selecting citations from the database for analysis to form a learning corpus;3) acquiring data for input features;4) formatting input features for learning comprising the following steps: a) data preprocessing;b) feature weighting; andc) feature scaling;5) labeling citations in the learning corpus in view of a gold standard reference;6) selecting a learning method;7) training the learning method further comprising the following steps: a) acquiring the labeled citations in the learning corpus;b) employing model selection for finding the best models for the corpus that take into account the correlations among citations in the same document by selecting one or more citations per document; andc) deriving unbiased error estimates by applying error estimators that take into account the correlations among citations in the same document;8) storing the output of the learning method; and9) ranking the features as determined by the learning method; and B. applying the learned influential features to classify additional citations in a database comprising the steps of: 1) selecting citations for analysis from the database, not including citations used in the learning corpus, to form an application corpus;2) acquiring data for input features;3) formatting input features for learning comprising the following steps: a) data preprocessing;b) feature weighting; andc) feature scaling;4) classifying the citations by applying the stored output of the learning method of step A.(8) to the citations; and5) outputting the classification results for the citations.
地址