发明名称 Methods for the indentification of textual and physical structured query fragments for the analysis of textual and biopolymer information
摘要 We disclose a combinatorial, hierarchical process that uses "process-patterns" in one preferred embodiment to identify, classify, and compare substrings within strings; and in another preferred embodiment to identify, classify, compare, generate, and separate fragments derived from one or more physical samples of polynucleotides. These substrings (and their physical polynucleotide counterparts) are called "partition" fragments, and the process-pattern-defined derivatives that some, but not all, "partition" fragments may yield are called "structured query fragments" (SQFs). A process-pattern is both: (i) an ordered set of short "target" (one from each major search class) sites that must be present (and whose higher-ranked members of the same major search class must not have any sites) within the relevant search area of a partition fragment, and (ii) a step-wise delimitation process (where each step has a defined polarity and occurs after a target is found) that restricts the region of a partition fragment where the next class-specific, pre-emptive target-search takes place. In one preferred embodiment, the computer software disclosed herein locates the process-patterns and SQFs of interest within the partition fragments in the string(s) under study (e.g., a set of polynucleotide sequence data), stores the results, and provides for access to this data by database query and analysis tools. These computational analyses are emulated by another preferred embodiment using physical samples of polynucleotides and the laboratory methods disclosed herein. In the latter, sequence-specific, double-stranded cleavage effectors utilize as substrates and generate as products progressively expanding sets of asymmetrically end-immobilized DNA, a process that ultimately yields extremely large numbers of individually distinguishable SQFs (called "ranged" SQFs) with lengths between 100-700 nucleotides. In almost all cases, the known process-pattern and observed length of an experimentally obtained ranged SQF provide sufficient information for the computer software disclosed herein to map the ranged SQF automatically to its partition fragment (and location) within a set of polynucleotide sequence data that characterizes the physical sample(s) of polynucleotides under study.
申请公布号 US2002177138(A1) 申请公布日期 2002.11.28
申请号 US20010991013 申请日期 2001.11.14
申请人 THE UNITED STATES OF AMERICA , REPRESENTED BY THE SECRETARY, DEPARTMENT OF HEALTH AND HUMAN SERVICES 发明人 BOISSY ROBERT J.
分类号 C12Q1/68;(IPC1-7):C12Q1/68;G06F19/00;G01N33/48;G01N33/50 主分类号 C12Q1/68
代理机构 代理人
主权项
地址