发明申请
US20080046187A1 Method, system and software arrangement for detecting or determining similarity regions between datasets
有权
用于检测或确定数据集之间的相似区域的方法,系统和软件布置
- 专利标题: Method, system and software arrangement for detecting or determining similarity regions between datasets
- 专利标题(中): 用于检测或确定数据集之间的相似区域的方法,系统和软件布置
-
申请号: US11410692申请日: 2006-04-24
-
公开(公告)号: US20080046187A1公开(公告)日: 2008-02-21
- 发明人: Salvatore Paxia , Bhubaneswar Mishra , Yi Zhou
- 申请人: Salvatore Paxia , Bhubaneswar Mishra , Yi Zhou
- 专利权人: New York University
- 当前专利权人: New York University
- 主分类号: G06F19/00
- IPC分类号: G06F19/00
摘要:
Methods, systems, and computer-readable media are provided which can identify and provide local variations in regions of similarity among two or more data sets. These data sets may be represented as sequences such as, e.g., genomic sequences or words in a text. The local variations in similarity levels can be provided by selecting an initial prior distribution relating the data sets, organizing the first data set into windows and the remaining data sets into blocks, using the priors to sample one or more sets of words from the first data set, computing a similarity curve from exact and inexact matches for these words and, if convergence of results is not achieved, computing a new set of priors and repeating the sampling and computation of similarity curves. The computations can be performed using an amount of computational time that is linearly proportional to the size of the data sets. The exemplary embodiments of the present invention can use Bayesian estimators to determine local variations in similarity levels and to refine estimates of the probabilistic distributions between iterations.
公开/授权文献
信息查询