摘要 |
In comparing a query sequence with a subject sequence and searching for a similar point in the subject sequence as described above, homologous search can be conducted at a higher accuracy than in the existing methods. After acquiring the sequential data of the query sequence and the subject sequence on the genome scale, these sequences are compression converted into a compressed query sequence and a compressed subject sequence by converting a homopolymer region consisting of two or more consecutive bases of a single kind into a single base of the same kind. Then, these sequences are compared with each other and partial compression subjectsequences in the compressed subject sequence agreeing with the compressed query sequence are narrowed and searched for. For the thus narrowed compressed candidate sequences and the query sequence, the consecutive numbers are compared for each base between both compressed sequences based on the data of the consecutive numbers of a single kind of bases observed in the individual uncompressed sequences. From the degree of agreement or disagreement in the consecutive numbers, a similarity showing the homology of the candidate sequence as described above to the query sequence is computed. Depending on the similarities, an arbitrary number of candidate sequences relatively highly homologous to the query sequence are ranked and selected. Thus, homologous search can be conducted at a high accuracy while avoiding the effect of the consecutive number of a single kind of bases in a homopolymer. |
申请人 |
RESEARCH ORGANIZATION OF INFORMATION AND SYSTEMS;GOJOBORI, TAKASHI;IKEO, KAZUHO;OKAYAMA, TOSHITSUGU |
发明人 |
GOJOBORI, TAKASHI;IKEO, KAZUHO;OKAYAMA, TOSHITSUGU |