摘要:
An input data set is treated as a plurality of grouped sets of key/value pairs, which enhances the utility of the MapReduce programming methodology. By utilizing such a grouping, map processing can be carried out independently on two or more related but possibly heterogeneous datasets (e.g., related by being characterized by a common primary key). The intermediate results of the map processing (key/value pairs) for a particular key can be processed together in a single reduce function by applying a different iterator to intermediate values for each group. Different iterators can be arranged inside reduce functions in ways however desired.
摘要:
A method of processing relationships of at least two datasets is provided. For each of the datasets, a map-reduce subsystem is provided such that the data of that dataset is mapped to corresponding intermediate data for that dataset. The intermediate data for that dataset is reduced to a set of reduced intermediate data for that dataset. Data corresponding to the sets of reduced intermediate data are merged, in accordance with a merge condition. In some examples, data being merged may include the output of one or more other mergers. That is, generally, merge functions may be flexibly placed among various map-reduce subsystems and, as such, the basic map-reduce architecture may be advantageously modified to process multiple relational datasets using, for example, clusters of computing devices.
摘要:
A MapReduce architecture may be utilized for sequence alignment algorithm processing (such as BLAST or BLAST-like algorithms). In addition, a MapReduce architecture may be extended such that memory of the computing devices of a MapReduce-configured system may be shared between different jobs of sequence alignment and/or other bioinformatics algorithm processing, thereby reducing overhead associated with executing such jobs using the MapReduce-configured system.
摘要:
A method of processing relationships of at least two datasets is provided. For each of the datasets, a map-reduce subsystem is provided such that the data of that dataset is mapped to corresponding intermediate data for that dataset. The intermediate data for that dataset is reduced to a set of reduced intermediate data for that dataset. Data corresponding to the sets of reduced intermediate data are merged, in accordance with a merge condition. In some examples, data being merged may include the output of one or more other mergers. That is, generally, merge functions may be flexibly placed among various map-reduce subsystems and, as such, the basic map-reduce architecture may be advantageously modified to process multiple relational datasets using, for example, clusters of computing devices.
摘要:
An input data set is treated as a plurality of grouped sets of key/value pairs, which enhances the utility of the MapReduce programming methodology. By utilizing such a grouping, map processing can be carried out independently on two or more related but possibly heterogeneous datasets (e.g., related by being characterized by a common primary key). The intermediate results of the map processing (key/value pairs) for a particular key can be processed together in a single reduce function by applying a different iterator to intermediate values for each group. Different iterators can be arranged inside reduce functions in ways however desired.
摘要:
Image-based features may be significantly correlated with click-through rates of images that depict a product, which may provide a more formal basis for the informal notion that good quality images will result in better click-through rates, as compared to poor quality images. Accordingly, an image assessment machine is configured to analyze image-based features to improve click-through rates for shopping search applications (e.g., a product search engine). Moreover, the image assessment machine may rank search results based on image quality factors and may notify sellers about low quality images. This may have the effect of improving the brand value for an online shopping website and accordingly have a positive long-term impact on the online shopping website.
摘要:
This disclosure describes systems and methods for identifying and correcting anomalies in web graphs. A web graph is transformed into a sequence of tokens via a walk algorithm. The sequence is fingerprinted to form a set of shingles. The singles are compared to shingles for other web graphs in order to determine similarity between web graphs. Actions are then carried out to remove anomalous web graphs and modify parameters governing web mapping in order to decrease the likelihood of future anomalous web graphs being built.
摘要:
Methods and systems are provided for search engine output-associated bidding in online advertising. Techniques are provided in which an advertiser may specify, as part of a bid, one or more requirements relating to search engine output. The one or more requirements may need to be met for an advertisement to be served in connection with the bid.
摘要:
Disclosed are apparatus and methods for providing next click information regarding search results. In certain embodiments, as objects (such as web pages, images, videos, audio files) are searched and clicked, click information is retained. Next click information with respect to specific objects can then be determined. This next click information can then be provided to an object search initiator so that such next click information is presented along with search result objects, for example, during a search query.
摘要:
A system that determines the performance of an integrated circuit (IC). During operation, the system receives probability distributions for parameters for the IC. Next, the system generates samples of the IC, wherein generating a given sample involves using the probability distribution to assign values to the parameters for components within the IC. The system then calculates output performance metrics for the samples based on the assigned values of the parameters, and uses the calculated output performance metrics to generate a distribution of output performance metrics for the samples.