The frequency of occurrence of a representation in a collection of documents is estimated for document retrieval purposes by identifying the actual frequency of occurrence (actual fi) of the representation in a sample (ni) of documents and calculating the difference between the maximum (fmax) and minimum (fmin) probable frequencies of occurrence of the representation in the collection. If the difference does not exceed a limit, a midpoint of the maximum and minimum probable frequencies (fmean) is the estimated frequency of occurrence of the representation. Document distribution probabilities are optimized and probability thresholds are established for the identification of documents. An initial probability threshold is established and is adjusted as the probabilities are scored for documents in samples. The document result list (170) is iteratively adjusted through the samples.