摘要:
A MapReduce architecture may be utilized for sequence alignment algorithm processing (such as BLAST or BLAST-like algorithms). In addition, a MapReduce architecture may be extended such that memory of the computing devices of a MapReduce-configured system may be shared between different jobs of sequence alignment and/or other bioinformatics algorithm processing, thereby reducing overhead associated with executing such jobs using the MapReduce-configured system.
摘要:
An input data set is treated as a plurality of grouped sets of key/value pairs, which enhances the utility of the MapReduce programming methodology. By utilizing such a grouping, map processing can be carried out independently on two or more related but possibly heterogeneous datasets (e.g., related by being characterized by a common primary key). The intermediate results of the map processing (key/value pairs) for a particular key can be processed together in a single reduce function by applying a different iterator to intermediate values for each group. Different iterators can be arranged inside reduce functions in ways however desired.
摘要:
A method of processing relationships of at least two datasets is provided. For each of the datasets, a map-reduce subsystem is provided such that the data of that dataset is mapped to corresponding intermediate data for that dataset. The intermediate data for that dataset is reduced to a set of reduced intermediate data for that dataset. Data corresponding to the sets of reduced intermediate data are merged, in accordance with a merge condition. In some examples, data being merged may include the output of one or more other mergers. That is, generally, merge functions may be flexibly placed among various map-reduce subsystems and, as such, the basic map-reduce architecture may be advantageously modified to process multiple relational datasets using, for example, clusters of computing devices.
摘要:
A method of processing relationships of at least two datasets is provided. For each of the datasets, a map-reduce subsystem is provided such that the data of that dataset is mapped to corresponding intermediate data for that dataset. The intermediate data for that dataset is reduced to a set of reduced intermediate data for that dataset. Data corresponding to the sets of reduced intermediate data are merged, in accordance with a merge condition. In some examples, data being merged may include the output of one or more other mergers. That is, generally, merge functions may be flexibly placed among various map-reduce subsystems and, as such, the basic map-reduce architecture may be advantageously modified to process multiple relational datasets using, for example, clusters of computing devices.
摘要:
An input data set is treated as a plurality of grouped sets of key/value pairs, which enhances the utility of the MapReduce programming methodology. By utilizing such a grouping, map processing can be carried out independently on two or more related but possibly heterogeneous datasets (e.g., related by being characterized by a common primary key). The intermediate results of the map processing (key/value pairs) for a particular key can be processed together in a single reduce function by applying a different iterator to intermediate values for each group. Different iterators can be arranged inside reduce functions in ways however desired.
摘要:
The application illustrates a light-emitting device including a contact layer and a current spreading layer on the contact layer. A part of the contact layer is a rough structure and a part of the contact layer is a flat structure. A part of the current spreading layer is a rough structure and a part of the current spreading layer is a flat structure. The rough region of the contact layer and the rough region of the current spreading layer are substantially overlapped.
摘要:
A light-emitting device comprises a semiconductor stacked structure, the semiconductor stacked structure comprising a p-type semiconductor layer, a n-type semiconductor layer and an multiple quantum well structure between the p-type semiconductor layer and the n-type semiconductor layer, wherein the multiple quantum well structure comprises a first multiple quantum well structure near the n-type semiconductor layer and a second multiple quantum well structure near the p-type semiconductor layer, wherein the first multiple quantum well structure has positive interface bound charge and the second multiple quantum well structure has zero interface bound charge.
摘要:
A light emitting diode (LED) illumination device (10) includes an LED (11), a frustum-shaped light guide member (12) and a light diffusing plate (13). The frustum-shaped light guide member (12) has a light input surface (120) and a light output surface (122) opposite to the light input surface (120). The light guide member (12) tapers from the light output surface (122) to the light input surface (120). The light input surface (120) is optically coupled to the LED (11), and the light diffusing plate (13) is optically coupled to the light output surface (122) of the light guide member (12).
摘要:
Techniques for identifying influential users of a social networking service are provided. Influential users may be identified via an algorithm in which an influence score is assigned to each user based at least in part on other members of the community users having taken an affirmative step with respect to the user's communications. Iterative processing may be performed, with each user's influence score being determined by contributions from other users, and each contribution being determined by the contributor's influence score as of a prior iteration. A map-reduce framework may be employed, with data representing the community being partitioned into a plurality of discrete shards, a map process corresponding to each shard calculating an influence score for users represented in the shard, and reduce processes ranking users according to influence score across all shards.
摘要:
A canopy clustering process merges at least one set of multiple single-center canopies together into a merged multi-center canopy. Multi-center canopies, as well as the single-center canopies, can then be used to partition data objects in a dataset. The multi-center canopies allow a canopy assignment condition constraint to be relaxed without risk of leaving any data objects in a dataset outside of all canopies. Approximate distance calculations can be used as similarity metrics to define and merge canopies and to assign data objects to canopies. In one implementation, a distance between a data object and a canopy is represented as the minimum of the distances between the data object and each center of a canopy (whether merged or unmerged), and the distance between two canopies is represented as the minimum of the distances for each pairing of the center(s) in one canopy and the center(s) in the other canopy.