摘要:
Our research conducted with the genome sequences of more than 250 species of organisms (including viral, microbial, and multi-cellular organisms, and human) results in the discovery that the occurrence of a particular subsequence (the so-called “motifs” or “n-mers,” (n being the length of the subsequences), which can be up to 25 and higher) in the genome of a particular species can be considered as a nearly random event; and that the occurrences of a particular subsequence in the genome sequences of different species can be considered as nearly independent events (with the exception of the cases where extremely closely related species are compared). The set of subsequences that occur in a particular species' genome can therefore be used as a genomic “fingerprint” of this species. This discovery leads to the concept of utilizing a set of pseudo-randomly designed subsequences for species identification or discrimination. These subsequences (probes, primers, motifs, n-mers) can be used with hybridization-based technologies (including, but not limited to, the microarray or PCR technologies) and any other technology allow to identity the fact of presence/absence of particular subsequence in genomic DNA for identification of species. The same approach can also be used to identify individuals of the same species (including the human species), to estimate the genome size of unknown organisms, and to estimate the total genome size in samples containing several viral, microbial, and eukaryotic genomes. The identification methods currently in use for these purposes require sequencing of the genomic sequences of the species or the individuals of interest. The introduction of the proposed computational method eradicates such requirement, and will tremendously reduce the expense of these tests.
摘要:
Processes for identifying whether any parasite or other organism is present in a host comprising: a. scanning for non-host signatures, b. scanning for one-error-removed non-host signatures; c. scanning for N-error removed non-host signatures; where N is selected to give the desired statistical certainty of the presence or absence of any parasite in the host. Algorithms useful for such detections and listings of specific signatures” (sequences or subsequences) for identifying specific microorganisms are also both provided.
摘要:
A method for identifying non-host nucleic acid sequence using sequence data. The method of identifying non-host nucleic acid can include sequencing a sample into sequences and associating the sequences with a host genome and then exclude any sequences that are associated with the host genome. The method can then associate the sequences with any known genomes and exclude any sequences that are associated with any known genome. The remaining sequences can be used as seed sequences to assemble a non-host nucleic acid.