摘要 |
A simple statistical model that predicts the distribution of false matches between peaks in matrix-assisted laser desorption/ionization mass spectrometry data and proteins in proteome databases is derived and validated. Given the cluttered and incomplete nature of the data, it is likely that neither simple ranking, nor simple hypothesis testing will be sufficient for truly robust microorganism identification over a large number of candidate microorganisms. In an effort to increase robust microorganism identification, the proteome databases are restricted to include data related to a given set of proteins, and not all proteins. By removing data from the proteome databases, the model is made more robust, i.e., there is a decrease in the number of false matches. |