发明公开
EP1387292A1 Method and apparatus for combining data of biological sequences into a non-redundant data source
审中-公开
用于生物序列的数据组合到一个非冗余数据源的方法和装置
- 专利标题: Method and apparatus for combining data of biological sequences into a non-redundant data source
- 专利标题(中): 用于生物序列的数据组合到一个非冗余数据源的方法和装置
-
申请号: EP02016796.1申请日: 2002-07-26
-
公开(公告)号: EP1387292A1公开(公告)日: 2004-02-04
- 发明人: Ohr, Christian
- 申请人: LION bioscience AG
- 申请人地址: Waldhofer Strasse 98 69123 Heidelberg DE
- 专利权人: LION bioscience AG
- 当前专利权人: LION bioscience AG
- 当前专利权人地址: Waldhofer Strasse 98 69123 Heidelberg DE
- 代理机构: Schohe, Stefan
- 主分类号: G06F17/30
- IPC分类号: G06F17/30 ; G06F19/00
摘要:
The invention provides a method for establishing or modifying a data source comprising a plurality of entries related to biological sequences that are non-redundant with regard to said sequences on the basis of a plurality of data sets of one or more basic data sources, each of said data sets comprising a biological sequence, said method comprising the steps of:
retrieving for one or more data sets a biological sequence contained in the data set and generating a hash key from the biological sequence thus retrieved by applying a collision-free hash function, said hash function mapping the data representing said sequence onto a message of a length shorter than the length of the original data representing the sequence,
for each of said data sets, adding information for retrieving information from said data set to an entry in a reference data source uniquely related to the hash key generated from said sequence contained in said data set, wherein a new entry in said reference data source is provided which comprises one unique hash key and information for retrieving the data set or data sets comprising the sequence from which said hash key was generated, if said reference data source does not comprise an entry related to said hash key,
such that each entry in said reference data source is uniquely identified by a hash key generated from a sequence.
The invention also relates to a corresponding computer system and a method of updating a non-redundant data source using a reference data source.
retrieving for one or more data sets a biological sequence contained in the data set and generating a hash key from the biological sequence thus retrieved by applying a collision-free hash function, said hash function mapping the data representing said sequence onto a message of a length shorter than the length of the original data representing the sequence,
for each of said data sets, adding information for retrieving information from said data set to an entry in a reference data source uniquely related to the hash key generated from said sequence contained in said data set, wherein a new entry in said reference data source is provided which comprises one unique hash key and information for retrieving the data set or data sets comprising the sequence from which said hash key was generated, if said reference data source does not comprise an entry related to said hash key,
such that each entry in said reference data source is uniquely identified by a hash key generated from a sequence.
The invention also relates to a corresponding computer system and a method of updating a non-redundant data source using a reference data source.
信息查询