Invention Publication
EP1387292A1 Method and apparatus for combining data of biological sequences into a non-redundant data source
审中-公开
用于生物序列的数据组合到一个非冗余数据源的方法和装置
- Patent Title: Method and apparatus for combining data of biological sequences into a non-redundant data source
- Patent Title (中): 用于生物序列的数据组合到一个非冗余数据源的方法和装置
-
Application No.: EP02016796.1Application Date: 2002-07-26
-
Publication No.: EP1387292A1Publication Date: 2004-02-04
- Inventor: Ohr, Christian
- Applicant: LION bioscience AG
- Applicant Address: Waldhofer Strasse 98 69123 Heidelberg DE
- Assignee: LION bioscience AG
- Current Assignee: LION bioscience AG
- Current Assignee Address: Waldhofer Strasse 98 69123 Heidelberg DE
- Agency: Schohe, Stefan
- Main IPC: G06F17/30
- IPC: G06F17/30 ; G06F19/00
Abstract:
The invention provides a method for establishing or modifying a data source comprising a plurality of entries related to biological sequences that are non-redundant with regard to said sequences on the basis of a plurality of data sets of one or more basic data sources, each of said data sets comprising a biological sequence, said method comprising the steps of:
retrieving for one or more data sets a biological sequence contained in the data set and generating a hash key from the biological sequence thus retrieved by applying a collision-free hash function, said hash function mapping the data representing said sequence onto a message of a length shorter than the length of the original data representing the sequence,
for each of said data sets, adding information for retrieving information from said data set to an entry in a reference data source uniquely related to the hash key generated from said sequence contained in said data set, wherein a new entry in said reference data source is provided which comprises one unique hash key and information for retrieving the data set or data sets comprising the sequence from which said hash key was generated, if said reference data source does not comprise an entry related to said hash key,
such that each entry in said reference data source is uniquely identified by a hash key generated from a sequence.
The invention also relates to a corresponding computer system and a method of updating a non-redundant data source using a reference data source.
retrieving for one or more data sets a biological sequence contained in the data set and generating a hash key from the biological sequence thus retrieved by applying a collision-free hash function, said hash function mapping the data representing said sequence onto a message of a length shorter than the length of the original data representing the sequence,
for each of said data sets, adding information for retrieving information from said data set to an entry in a reference data source uniquely related to the hash key generated from said sequence contained in said data set, wherein a new entry in said reference data source is provided which comprises one unique hash key and information for retrieving the data set or data sets comprising the sequence from which said hash key was generated, if said reference data source does not comprise an entry related to said hash key,
such that each entry in said reference data source is uniquely identified by a hash key generated from a sequence.
The invention also relates to a corresponding computer system and a method of updating a non-redundant data source using a reference data source.
Information query