摘要 |
PROBLEM TO BE SOLVED: To predict a gene domain, i.e., an encoding domain and a non-encoding domain of a protein from a DNA sequence or RNA sequence with a computer by utilizing a codon frequency table. SOLUTION: This method for predicting the protein-encoding domain is provided by finding that by selecting one correct reading frame from 6 reading frames in the DNA or RNA sequence, the total number of appearances of the codon included in the partial sequence becomes the closest to a specific codon frequency of a biological species as compared with the other reading frames, and confirming the fact in data of human and E. coli. For the selection of the correct reading frame, a large frequency codon group and a small frequency codon group are used. In the correct reading frame, the number of appearance of the large frequency codon group becomes a maximum and the number of appearances of the small frequency codon group becomes a minimum. The protein-encoding domain is predicted as an interval of overlapping an ORF with the maximum or minimum domain.
|