摘要:
A system and method for performing biological sequence similarity searching is disclosed. This includes a programmable logic device configured to include a pipeline that comprises a matching stage, the matching stage being configured to receive a data stream comprising a plurality of possible matches between a plurality of biological sequence data strings and a plurality of substrings of a query string. The pipeline may further include a ungapped extension prefilter stage located downstream from the matching stage, the prefilter stage being configured to shift through pattern matches between the biological sequence data strings and the plurality of substrings of a query string and provide a score so that only pattern matches that exceed a user defined score will pass downstream from the prefilter stage. The matching stage may include at least one Bloom filter.
摘要:
A system and method for performing similarity searching is disclosed. This includes a programmable logic device configured to include a pipeline that comprises a matching stage, the matching stage being configured to receive a data stream comprising a plurality of possible matches between a plurality of data strings and a plurality of substrings of a query string. The pipeline may further include an ungapped extension prefilter stage located downstream from the matching stage, the prefilter stage being configured to shift through pattern matches between the data strings and the plurality of substrings of a query string and provide a score so that only pattern matches that exceed a user defined score will pass downstream from the prefilter stage. The matching stage may include at least one Bloom filter.
摘要:
An apparatus and method for performing similarity searching on a data stream with respect to a query string are disclosed, where the data stream comprises a plurality of data substrings, and where the query string comprises a plurality of query substrings. A programmable logic device is used to filter the data stream to find a plurality of possible matches between the data substrings and a plurality of the query substrings, wherein the data substrings and the query substrings comprise a plurality of characters. From these possible matches, a determination can be made as to a similarity between the query string and at least a portion of the data stream.
摘要:
A system and method for performing similarity searching is disclosed. This includes a programmable logic device configured to include a pipeline that comprises a matching stage, the matching stage being configured to receive a data stream comprising a plurality of possible matches between a plurality of data strings and a plurality of substrings of a query string. The pipeline may further include an ungapped extension prefilter stage located downstream from the matching stage, the prefilter stage being configured to shift through pattern matches between the data strings and the plurality of substrings of a query string and provide a score so that only pattern matches that exceed a user defined score will pass downstream from the prefilter stage. The matching stage may include at least one Bloom filter.
摘要:
Methods and systems for performing parallel membership queries to Bloom filters for Longest Prefix Matching, where address prefix memberships are determined in sets of prefixes sorted by prefix length. Hash tables corresponding to each prefix length are probed from the longest to the shortest match in the vector, terminating when a match is found or all of the lengths are searched. The performance, as determined by the number of dependent memory accesses per lookup, is held constant for longer address lengths or additional unique address prefix lengths in the forwarding table given that memory resources scale linearly with the number of prefixes in the forwarding table. For less than 2 Mb of embedded RAM and a commodity SRAM, the present technique achieves average performance of one hash probe per lookup and a worst case of two hash probes and one array access per lookup.
摘要:
The present invention relates to a method and system of performing parallel membership queries to Bloom filters for Longest Prefix Matching, where address prefix memberships are determined in sets of prefixes sorted by prefix length. Hash tables corresponding to each prefix length are probed from the longest to the shortest match in the vector, terminating when a match is found or all of the lengths are searched. The performance, as determined by the number of dependent memory accesses per lookup, is held constant for longer address lengths or additional unique address prefix lengths in the forwarding table given that memory resources scale linearly with the number of prefixes in the forwarding table. For less than 2 Mb of embedded RAM and a commodity SRAM, the present technique achieves average performance of one hash probe per lookup and a worst case of two hash probes and one array access per lookup.
摘要:
The present invention relates to a method and apparatus based on Bloom filters for detecting predefined signatures (a string of bytes) in a network packet payload. A Bloom filter is a data structure for representing a set of strings in order to support membership queries. Hardware Bloom filters isolate all packets that potentially contain predefined signatures. Another independent process eliminates false positives produced by the Bloom filters. The system is implemented on a FPGA platform, resulting in a set of 10,000 strings being scanned in the network data at the line speed of 2.4 Gbps.
摘要:
A method and a system allow accessing several of a user's controlled access accounts by presenting the credentials of only one of the accounts. The method may include (a) storing the credentials for each of the user's accounts; (b) receiving from the user credentials corresponding to any of the user's accounts; (c) presenting the received credentials to access the corresponding account; and (d) upon successful access of the corresponding account, using the stored credentials to access one or more of the user's accounts without requiring the user to present the corresponding credentials. For each of the user's accounts, the credentials are stored encrypted, using a randomly generated key, common to all the encrypted credentials. In addition, the randomly generated key is encrypted using the credentials of each of the accounts. In that manner, plain-text copies of neither the random key nor the credentials of the accounts need to be stored.
摘要:
A medical concept is learned about or inferred from a medical transcript. A probabilistic model is trained from medical transcripts. For example, the problem is treated as a graphical model. Discrimitive or generative learning is used to train the probabilistic model. A mutual information criterion can be employed to identify a discrete set of words or phrases to be used in the probabilistic model. The model is based on the types of medical transcripts, focusing on this source of data to output the most probable state of a patient in the medical field or domain. The learned model may be used to infer a state of a medical concept for a patient.
摘要:
A method and a system allow accessing several of a user's controlled access accounts by presenting the credentials of only one of the accounts. The method may include (a) storing the credentials for each of the user's accounts; (b) receiving from the user credentials corresponding to any of the user's accounts; (c) presenting the received credentials to access the corresponding account; and (d) upon successful access of the corresponding account, using the stored credentials to access one or more of the user's accounts without requiring the user to present the corresponding credentials. For each of the user's accounts, the credentials are stored encrypted, using a randomly generated key, common to all the encrypted credentials. In addition, the randomly generated key is encrypted using the credentials of each of the accounts. In that manner, plain-text copies of neither the random key nor the credentials of the accounts need to be stored.