-
公开(公告)号:US07730316B1
公开(公告)日:2010-06-01
申请号:US11525312
申请日:2006-09-22
Applicant: Jonathan Baccash
Inventor: Jonathan Baccash
IPC: G06F21/00
CPC classification number: G06F21/64 , G06F2221/2149
Abstract: Methods and computer program products for creating sketches of a document, which are compared with sketches of other documents, in order to determine the documents' degree of similarity. A sketch is a digest of information from random locations within a document. A document is divided into a set of shingles. Each shingle is converted into a set of fingerprints. A sketch is determined based on one bit fingerprints thus created. In order to create additional sketches of the document, a new set of fingerprints are created by randomization techniques.
Abstract translation: 用于创建文档草图的方法和计算机程序产品,其与其他文档的草图进行比较,以便确定文档的相似度。 草图是文档中随机位置的信息摘要。 文件分为一组带状疱疹。 每个木瓦被转换成一组指纹。 基于由此创建的一个位指纹来确定草图。 为了创建文档的附加草图,通过随机化技术创建一组新的指纹。
-
2.
公开(公告)号:US20130110407A1
公开(公告)日:2013-05-02
申请号:US13621716
申请日:2012-09-17
Applicant: Jonathan Baccash , Aaron Halpern , Chao Tian , Krishna Pant , Paolo Carnevali
Inventor: Jonathan Baccash , Aaron Halpern , Chao Tian , Krishna Pant , Paolo Carnevali
IPC: G06F17/18
Abstract: After DNA fragments are sequenced and mapped to a reference, various hypotheses for the sequences in a variant region can be scored to find which sequence hypotheses are more likely. A hypothesis can include a specific variable fraction for the plurality of alleles that comprise the sequence hypothesis in the region. A likelihood of each hypothesis can be determined using a probability that accounts for the fraction of the alleles specified in the respective sequence hypothesis. Thus, other hypotheses besides standard homozygous and equal heterozygous (i.e., one chromosome with A and one with B in a cell) can be explored by explicitly including the variable fractions of the alleles as a parameter in the optimization. Also, a variant score can be determined for a variant relative to a reference. The variant score can be used to determine a variant calibrated score indicating a likelihood that the variant call is correct.
Abstract translation: 对DNA片段进行测序并映射到参考文献后,可以对变体区域中的序列进行各种假设,以确定哪些序列假设更有可能。 假设可以包括在该区域中构成序列假设的多个等位基因的特定可变部分。 每个假设的可能性可以使用考虑各个序列假设中规定的等位基因部分的概率来确定。 因此,除了标准纯合和等同杂合(即,具有A和一个在细胞中具有B的一个染色体)之外的其他假设可以通过在优化中明确地包括等位基因的可变部分作为参数来探索。 此外,可以针对相对于参考的变体确定变体得分。 变体得分可用于确定变体校准分数,指示变体调用正确的可能性。
-