Method and apparatus for combining data of biological sequences into a non-redundant data source

Invention Publication

EP1387292A1 Method and apparatus for combining data of biological sequences into a non-redundant data source 审中-公开

Title translation: 用于生物序列的数据组合到一个非冗余数据源的方法和装置

Please log in to see more content

Patent Title: Method and apparatus for combining data of biological sequences into a non-redundant data source
Patent Title (中): 用于生物序列的数据组合到一个非冗余数据源的方法和装置
Application No.: EP02016796.1

Application Date: 2002-07-26
Publication No.: EP1387292A1

Publication Date: 2004-02-04
Inventor: Ohr, Christian
Applicant: LION bioscience AG
Applicant Address: Waldhofer Strasse 98 69123 Heidelberg DE
Assignee: LION bioscience AG
Current Assignee: LION bioscience AG
Current Assignee Address: Waldhofer Strasse 98 69123 Heidelberg DE
Agency: Schohe, Stefan
Main IPC: G06F17/30
IPC: G06F17/30 ; G06F19/00

Method and apparatus for combining data of biological sequences into a non-redundant data source

Abstract:

The invention provides a method for establishing or modifying a data source comprising a plurality of entries related to biological sequences that are non-redundant with regard to said sequences on the basis of a plurality of data sets of one or more basic data sources, each of said data sets comprising a biological sequence, said method comprising the steps of:

retrieving for one or more data sets a biological sequence contained in the data set and generating a hash key from the biological sequence thus retrieved by applying a collision-free hash function, said hash function mapping the data representing said sequence onto a message of a length shorter than the length of the original data representing the sequence,
for each of said data sets, adding information for retrieving information from said data set to an entry in a reference data source uniquely related to the hash key generated from said sequence contained in said data set, wherein a new entry in said reference data source is provided which comprises one unique hash key and information for retrieving the data set or data sets comprising the sequence from which said hash key was generated, if said reference data source does not comprise an entry related to said hash key,
such that each entry in said reference data source is uniquely identified by a hash key generated from a sequence.
The invention also relates to a corresponding computer system and a method of updating a non-redundant data source using a reference data source.

Abstract(Chinese):

本发明提供了一种用于建立或修改数据源包括与生物序列条目的多元性没有非冗余关于所述序列的数据组中的一个或多个基本的数据源，每一个的多个的基础上所述数据集包含生物序列，所述方法包括以下步骤：检索一个或多个数据集包含在所述数据中的生物序列设置，并产生从生物序列的散列密钥通过施加一个无碰撞散列函数。因此检索所述散列函数映射表示所述序列到长度比表示的序列的原始数据，对每个所述数据集的长度更短的消息中的数据，用于检索从所述数据组的信息，以一个基准数据的条目信息中添加源唯一地涉及从包含在所述数据集的所述序列生成的散列密钥，worin在所述参考数据源是新条目提供了包含用于检索所述数据集或数据集，其包含序列从其中所述散列密钥产生，如果所述基准数据源不包括与该散列密钥的条目一个唯一的哈希键和信息，审查的确在所述每个条目基准数据源唯一地由从一个序列生成的哈希关键字标识。因此，本发明涉及相应的计算机系统和更新使用参考数据源的非冗余数据源的方法。

Information query

Global Dossier Espacenet