Abstract:
An ultra-fast solution to the problem of comparing genomes across sequencing technologies and genome freezes, while preserving privacy, is presented. A method for transforming a standard genome representation (i.e., a list of variants relative to a reference) into a "fingerprint" of the genome does not require knowledge of the technology, reference and encoding used, and yields fingerprints that can be readily compared to ascertain relatedness between two genome representations. Because of their reduced size, computation on the genome fingerprints is fast and requires little memory. This enables scaling up a variety of important genome analyses, including determinations of degree of relatedness, recognizing duplicative sequenced genomes in a set, and many others. Because the original genome representation cannot be reconstructed from its fingerprint, the method also has significant implications for privacy-preserving genome analytics.
Abstract:
An ultra-fast solution to the problem of comparing genotypes across genotyping technologies, while preserving privacy, is presented. A method for transforming a standard genotype representation (i.e., a list of alleles associated with IDs representing single nucleotide variants) into a "fingerprint" of the genotype does not require knowledge of the SNP chip technology, and yields fingerprints that can be readily compared to ascertain relatedness between two genotypes even if the genotypes were created using different SNP chip designs. Because of their reduced size, computation on the genotype fingerprints is fast and requires little memory. This enables scaling up a variety of important genotype analyses, including determinations of degree of relatedness, recognizing duplicative sequenced genotypes in a set, and many others. Because the original genotype representation cannot be reconstructed from its fingerprint, the method also has significant implications for privacy-preserving genotype analytics.