摘要 |
The present invention relates to methods and algorithms that can be used to identify sequence motifs that are either under- or over-represented in a given nucleotide sequence as compared to the frequency of those sequences th at would be expected to occur by chance, or that are either under- or over-r epresented as compared to the frequency of those sequences that occur in oth er nucleotide sequences, and to methods of scoring sequences based on the oc currence of these sequence motifs. Such sequence motifs may be biologically significant, for example they may constitute transcription factor binding si tes, mRNA stability/instability signals, epigenetic signals, and the like. T he methods of the invention can also be used, inter alia, to classify sequen ces or organisms in terms of their phylogenetic relationships, or to identif y the likely host of a pathogenic organism. The methods of the present inven tion can also be used to optimize expression of proteins.
|