发明名称 METHOD AND APPARATUS FOR COMBINING DATA OF BIOLOGICAL SEQUENCES INTO A NON-REDUNDANT DATA SOURCE
摘要 The invention provides a method for establishing or modifying a data source comprising a plurality of entries related to biological sequences that are non-redundant with regard to said sequences on the basis of a plurality of data sets of one or more basic data sources, each of said data sets comprising a biological sequence, said method comprising the steps of: - retrieving for one or more data sets a biological sequence contained in the data set and generating a hash key from the biological sequence thus retrieved by applying a collision-free hash function, said hash function mapping the data representing said sequence onto a message of a length shorter than the length of the original data representing the sequence, - for each of said data sets, adding information for retrieving information from said data set to an entry in a reference data source uniquely related to the hash key generated from said sequence contained in said data set, wherein a new entry in said reference data source is provided which comprises one unique hash key and information for retrieving the data set or data sets comprising the sequence from which said hash key was generated, if said reference data source does not comprise an entry related to said hash key, such that each entry in said reference data source is uniquely identified by a hash key generated from a sequence. The invention also relates to a corresponding computer system and a method of updating a non-redundant data source using a reference data source.
申请公布号 WO2004013769(A2) 申请公布日期 2004.02.12
申请号 WO2003EP07743 申请日期 2003.07.16
申请人 LION BIOSCIENCE AG;OHR, CHRISTIAN 发明人 OHR, CHRISTIAN
分类号 G06F17/30;G06F19/28 主分类号 G06F17/30
代理机构 代理人
主权项
地址