摘要:
A system and method for performing similarity searching is disclosed. This includes a programmable logic device configured to include a pipeline that comprises a matching stage, the matching stage being configured to receive a data stream comprising a plurality of possible matches between a plurality of data strings and a plurality of substrings of a query string. The pipeline may further include an ungapped extension prefilter stage located downstream from the matching stage, the prefilter stage being configured to shift through pattern matches between the data strings and the plurality of substrings of a query string and provide a score so that only pattern matches that exceed a user defined score will pass downstream from the prefilter stage. The matching stage may include at least one Bloom filter.
摘要:
A system and method for performing similarity searching is disclosed. This includes a programmable logic device configured to include a pipeline that comprises a matching stage, the matching stage being configured to receive a data stream comprising a plurality of possible matches between a plurality of data strings and a plurality of substrings of a query string. The pipeline may further include an ungapped extension prefilter stage located downstream from the matching stage, the prefilter stage being configured to shift through pattern matches between the data strings and the plurality of substrings of a query string and provide a score so that only pattern matches that exceed a user defined score will pass downstream from the prefilter stage. The matching stage may include at least one Bloom filter.
摘要:
An apparatus and method for performing similarity searching on a data stream with respect to a query string are disclosed, where the data stream comprises a plurality of data substrings, and where the query string comprises a plurality of query substrings. A programmable logic device is used to filter the data stream to find a plurality of possible matches between the data substrings and a plurality of the query substrings, wherein the data substrings and the query substrings comprise a plurality of characters. From these possible matches, a determination can be made as to a similarity between the query string and at least a portion of the data stream.