Method, system and software arrangement for detecting or determining similarity regions between datasets

发明申请

US20080046187A1 Method, system and software arrangement for detecting or determining similarity regions between datasets 有权

标题翻译：用于检测或确定数据集之间的相似区域的方法，系统和软件布置

请登陆查看更多内容

专利标题： Method, system and software arrangement for detecting or determining similarity regions between datasets
专利标题（中）： 用于检测或确定数据集之间的相似区域的方法，系统和软件布置
申请号： US11410692

申请日： 2006-04-24
公开(公告)号： US20080046187A1

公开(公告)日： 2008-02-21
发明人: Salvatore Paxia , Bhubaneswar Mishra , Yi Zhou
申请人： Salvatore Paxia , Bhubaneswar Mishra , Yi Zhou
专利权人： New York University
当前专利权人： New York University
主分类号： G06F19/00
IPC分类号： G06F19/00

Method, system and software arrangement for detecting or determining similarity regions between datasets

摘要：

Methods, systems, and computer-readable media are provided which can identify and provide local variations in regions of similarity among two or more data sets. These data sets may be represented as sequences such as, e.g., genomic sequences or words in a text. The local variations in similarity levels can be provided by selecting an initial prior distribution relating the data sets, organizing the first data set into windows and the remaining data sets into blocks, using the priors to sample one or more sets of words from the first data set, computing a similarity curve from exact and inexact matches for these words and, if convergence of results is not achieved, computing a new set of priors and repeating the sampling and computation of similarity curves. The computations can be performed using an amount of computational time that is linearly proportional to the size of the data sets. The exemplary embodiments of the present invention can use Bayesian estimators to determine local variations in similarity levels and to refine estimates of the probabilistic distributions between iterations.

摘要（中）：

提供了可以识别和提供两个或多个数据集之间的相似区域中的局部变化的方法，系统和计算机可读介质。这些数据集可以表示为例如文本中的基因组序列或单词的序列。相似度级别的局部变化可以通过选择与数据集相关联的初始先验分布，将第一数据集合组合成窗口并将剩余的数据集组织成块来提供，使用先验从第一数据中采样一个或多个单词组设置，从这些词的精确和不精确匹配计算相似曲线，并且如果没有实现结果的收敛，则计算新的先验集合并重复相似曲线的采样和计算。可以使用与数据集的大小成线性比例的计算时间量来执行计算。本发明的示例性实施例可以使用贝叶斯估计器来确定相似度级别中的局部变化并且改进迭代之间的概率分布的估计。

公开/授权文献

US09390163B2 Method, system and software arrangement for detecting or determining similarity regions between datasets 公开/授权日：2016-07-12

信息查询

Global Dossier Espacenet