主权项 |
1. A method of characterizing one or more microorganisms, the method comprising:
accessing a file comprising one or more digital DNA sequences; selecting, by a computer system, a digital file comprising the file, wherein each of the one or more digital DNA sequences corresponds to a microorganism; dividing, using the computer system, the digital file into a plurality of file portions, wherein each file portion can be processed by a processing core; segmenting, by the computer system, each of the one or more digital DNA sequences in the plurality of file portions into one or more first portions; performing, by the computer system, a first set of alignments by comparing the one or more first portions to information stored in a first database; determining, by the computer system, sequence portions from among the one or more first portions that have an alignment match to the information stored in the first database; segmenting each of the one or more digital DNA sequences into one or more second portions using a window of a window size; for each DNA sequence of the one or more digital DNA sequences, performing, by the computer system, a set of iterative alignment actions including:
performing a second set of alignments by comparing the one or more second portions to information stored in a second database;determining whether the comparison failed to produce at least one alignment match between any second portion of the one or more second portions and information stored in the second database;when it is determined that the comparison failed to produce at least one alignment match between any second portion of the one or more second portions and information stored in the second database and when the window size has not decreased beyond a designated stringency level:
decreasing the window size;repeating the segmenting of the DNA sequence into one or more second portions using a window of the decreased window size; andrepeating the set of iterative alignment actions; and characterizing one or more microorganisms based on:
a first result of the determination of sequence portions from the one or more first portions that have an alignment match to the information stored in the first database; anda second result of the performance of the set of second-database alignment actions. |