发明名称 Annotation of genome sequences
摘要 A method of identifying one or more proteins in an unannotated DNA sequence is disclosed. The method involves dividing the DNA sequence into a plurality of sequence fragments of substantially the same length (about 300 to 5000 base pairs, most typically 1000 to 1050 base pairs. A six frame translation is then performed on each of the DNA sequence fragments to obtain six translated amino acid sequence fragments for each DNA sequence fragment. Each of the translated sequence fragments is subjected to theoretical digestion to obtain a plurality of cleaved peptide sequences. Next experimental empirical data for peptide fragments from a protein digested in the same manner as the theoretical digestion is compared with the theoretical data generated in step for each of the translated sequence fragments to identify one or more translated sequence fragments which include a substantial number of peptides present in the digested protein. The sequence fragment which has the greatest number of theoretical peptide masses correlating to the empirical data indicates the likely location of the protein of interest in the DNA sequence. To avoid problem where the sequence is divided at the site of a protein, the DNA sequence is duplicated and the original and duplicate are split in such a manner that the sequence fragments from the original overlap the cuts in the original genome sequence.
申请公布号 US2006210972(A1) 申请公布日期 2006.09.21
申请号 US20050507257 申请日期 2005.04.27
申请人 PROTEOME SYSTEMS INTELLECTUAL PROPERTY PTY LTD 发明人 ARTHUR JONATHAN W.;WILKINS MARC;TRAINI MATHEW D.
分类号 C12Q1/68;C07K5/103;C07K14/35;C07K14/47;G06F19/00 主分类号 C12Q1/68
代理机构 代理人
主权项
地址