发明名称 Method, system and software arrangement for detecting or determining similarity regions between datasets
摘要 Methods, systems, and computer-readable media are provided which can identify and provide local variations in regions of similarity among two or more data sets. These data sets may be represented as sequences such as, e.g., genomic sequences or words in a text. The local variations in similarity levels can be provided by selecting an initial prior distribution relating the data sets, organizing the first data set into windows and the remaining data sets into blocks, using the priors to sample one or more sets of words from the first data set, computing a similarity curve from exact and inexact matches for these words and, if convergence of results is not achieved, computing a new set of priors and repeating the sampling and computation of similarity curves. The computations can be performed using an amount of computational time that is linearly proportional to the size of the data sets. The exemplary embodiments of the present invention can use Bayesian estimators to determine local variations in similarity levels and to refine estimates of the probabilistic distributions between iterations.
申请公布号 US2008046187(A1) 申请公布日期 2008.02.21
申请号 US20060410692 申请日期 2006.04.24
申请人 NEW YORK UNIVERSITY 发明人 PAXIA SALVATORE;MISHRA BHUBANESWAR;ZHOU YI
分类号 G06F19/00 主分类号 G06F19/00
代理机构 代理人
主权项
地址
您可能感兴趣的专利