摘要:
Systems, methods, and computer program products identify one or more web page impressions satisfying one or more simply queries, each of the one or more web page impressions associated with a respective impression ID. Respective impression IDs of the one or more web pages satisfying the one or more simple queries are stored in an impression log. Subsequent to storing the respective impression IDs, a query is received from a client device, and a number of impression IDs for the one or more web pages satisfying the query are identified based on the identified one or more web page impressions satisfying the one or more simple queries.
摘要:
A system generates a hash value for a fetched document and compares the hash value with a set of stored hash values to identify ones of the stored hash values with a sequence of bit positions, less than all of the bit positions, that match a corresponding sequence of bit positions of the hash value. The system also determines whether any of the identified hash values are substantially similar to the hash value and identify the fetched document as a near-duplicate of another document when one of the identified hash values is substantially similar to the hash value.
摘要:
A method and system for intelligently directing a search of a peer-to-peer network, in which a user performing a search is assisted in choosing a host which is likely to return fast, favorable results to the user. A host monitor monitors the peer-to-peer network and collects data on various characteristics of the hosts which make up the network. Thereafter, a host selector ranks the hosts using the data, and passes this information to the user. The user then selects one or more of the highly-ranked hosts as an entry point into the network. Additionally, a cache may collect a list of hosts based on the content on the hosts. In this way, a user may choose to connect to a host which is known to contain information relevant to the user's search. The host selector may be used to select from among the hosts listed in the cache.
摘要:
Systems, methods, and computer program products identify one or more web page impressions satisfying one or more simply queries, each of the one or more web page impressions associated with a respective impression ID. Respective impression IDs of the one or more web pages satisfying the one or more simple queries are stored in an impression log. Subsequent to storing the respective impression IDs, a query is received from a client device, and a number of impression IDs for the one or more web pages satisfying the query are identified based on the identified one or more web page impressions satisfying the one or more simple queries.
摘要:
A table, such as a database table can be partitioned into blocks that are conveniently sized for storage and retrieval. The amount of storage space required and the speed of storing and retrieving blocks is proportional to the size of the blocks. Compressing the blocks leads to less required space and more speed. The columns in a table, and therefore the rows in a transposed block, tend to contain similar data. Compression algorithms can work more efficiently when sequential data items are similar. Therefore, transposing the blocks before compression or compressing them in a column-wise manner leads to better compression. Different compression algorithms can be used for each set of columnar data to yield even better compression.
摘要:
A method and system for intelligently directing a search of a peer-to-peer network, in which a user performing a search is assisted in choosing a host which is likely to return fast, favorable results to the user. A host monitor monitors the peer-to-peer network and collects data on various characteristics of the hosts which make up the network. Thereafter, a host selector ranks the hosts using the data, and passes this information to the user. The user then selects one or more of the highly-ranked hosts as an entry point into the network. Additionally, a cache may collect a list of hosts based on the content on the hosts. In this way, a user may choose to connect to a host which is known to contain information relevant to the user's search. The host selector may be used to select from among the hosts listed in the cache.
摘要:
A method and system for intelligently directing a search of a peer-to-peer network, in which a user performing a search is assisted in choosing a host which is likely to return fast, favorable results to the user. A host monitor monitors the peer-to-peer network and collects data on various characteristics of the hosts which make up the network. Thereafter, a host selector ranks the hosts using the data, and passes this information to the user. The user then selects one or more of the highly-ranked hosts as an entry point into the network. Additionally, a cache may collect a list of hosts based on the content on the hosts. In this way, a user may choose to connect to a host which is known to contain information relevant to the user's search. The host selector may be used to select from among the hosts listed in the cache.
摘要:
A space-efficient system and method for generating an approximate &phgr;-quantile data element of a data set in a single pass over the data set, without a priori knowledge of the size of the data set. The approximate &phgr;-quantile is guaranteed to lie within a user-specified approximation error &egr; of the true quantile being sought with a probability of at least 1−&dgr;, with &dgr; being a user-defined probability of failure. B buffers, each having a capacity of k elements, initially are filled with elements from the data set, with the values of b and k depending on approximation error e and the probability &dgr;. The buffers are then collapsed into an output buffer, with the remaining buffers then being refilled with elements, collapsed (along with the previous output buffer), and so on until the entire data set has been processed and a single output remains. The element of the output corresponding to the approximate quantile is then output as the approximate quantile. In later iterations (when the height of the tree is at least equal to a predetermined height that depends on &dgr; and &egr;), the data is sampled non-uniformly to populate the buffers to render the desired performance. Parallel processors can be used, with the final output buffers of the processors being sent to a collecting processor P0 as input buffers to the collecting processor P0.
摘要:
The disclosure provides various embodiments of systems, methods, and software for supporting server-side product catalogs. Software for managing ad serving may comprise computer readable instructions embodied on media and be operable to identify a logically local directed graph representing a logically remote network property associated with a publisher. The network property is associated with at least one product catalog representing a package of network ad slots. The software may then generate an ad service flight plan for serving various ones of a plurality of ads associated with a first of the network ad slots using an iterative solution on the directed graph.
摘要:
A system generates a hash value for a fetched document and compares the hash value with a set of stored hash values to identify ones of the stored hash values with a sequence of bit positions, less than all of the bit positions, that match a corresponding sequence of bit positions of the hash value. The system also determines whether any of the identified hash values are substantially similar to the hash value and identify the fetched document as a near-duplicate of another document when one of the identified hash values is substantially similar to the hash value.