摘要:
An approach is provided for initiating generation of a media compilation based on one or more sampling criteria. A sampling platform determines at least one subset of one or more media items captured of at least one event. The sampling platform also partitions the at least one subset of the one or more media items into one or more bins and generates at least one compilation of the at least one subset of the one or more items based, at least in part, on whether the one or more media items in the one or more bins at least substantially meet one or more sampling criteria.
摘要:
A system that implements a scalable data storage service may maintain tables in a non-relational data store on behalf of clients. The system may provide a Web services interface through which service requests are received, and an API usable to request that a table be created, deleted, or described; that an item be stored, retrieved, deleted, or its attributes modified; or that a table be queried (or scanned) with filtered items and/or their attributes returned. An asynchronous workflow may be invoked to create or delete a table. Items stored in tables may be partitioned and indexed using a simple or composite primary key. The system may not impose pre-defined limits on table size, and may employ a flexible schema. The service may provide a best-effort or committed throughput model. The system may automatically scale and/or re-partition tables in response to detecting workload changes, node failures, or other conditions or anomalies.
摘要:
A data repository system and method are provided. A method in accordance with an embodiment includes an operation that can be used to port data from one or more existing database partitions to new database partitions according to a minimally progressive hash. The method can be used to increase the overall size of databases while a system runs hot, with little or no downtime.
摘要:
A method for determining when a database system query optimizer should employ join skew avoidance steps. The method includes dynamically calculating the worst-case anticipated frequency distribution for a particular relation along a particular set of join column(s) at query execution time. The calculated frequency distribution value is compared to a skew threshold, the skew threshold representing the number of rows on the same distinct value that would lead to avoidable processing inefficiencies. It is then determined that the database system query optimizer should employ join skew avoidance steps if the calculated frequency distribution value exceeds the skew threshold.
摘要:
Prediction-based compression engines are spoon-fed with sequentially efficiently compressible (SEC) streams of input data that make it possible for the compression engines to more efficiently compress or otherwise compact the incoming data than would be possible with streams of input data accepted on a TV-raster scan basis. Various techniques are disclosed for intentionally forming SEC input data streams. Among these are the tight packing of alike files or fragments into concatenation suitcases and the decomposition of files into substantially predictably consistent (SPC) fragments or segments that are routed to different suitcases according to their type. In a graphics-directed embodiment, image frames are partitioned into segment areas that are internally SPC and multidirectional walks (i.e., U-turning walks) are defined in the segment areas where these defined walks are traced during compression and also during decompression. A variety of pre-compression data transformation methods are disclosed for causing apparently random data sequences to appear more compressibly alike to each other. The methods are usable in systems that permit substantially longer times for data compaction operations than for data decompaction operations.
摘要:
An analyzer/classifier/synthesizer/prioritizing tool for data comprises use of an admissible geometrization process with data transformed and partitioned by an input process into one or more input matrices and one or more partition classes and one or more scale groups. The data to be analyzed/classified/synthesized/prioritized is processed by an admissible geometrization technique such as 2-partition modified individual differences multidimensional scaling (2p-IDMDS) to produce at least a measure of geometric fit. Using the measure of geometric fit and possibly other 2p-IDMDS output, a back end process analyzes, synthesizes, classifies, and prioritizes data through patterns, structure, and relations within the data.
摘要:
A method is provided for managing, in a computer system, design of a database system having a set of schemata. The method includes, in a first computer process, extracting dependencies from the database system and identifying the set of schemata. The method further includes, for each specific schema in the set of schemata, creating in a second computer process a partition that, in turn, contains a further partition for each element of the specific schema, so as to establish a hierarchy of partitions in accordance with the structure of the set of schemata. The method also includes storing a representation of the database system including subsystems, dependency relationships among the subsystems, and the hierarchy of partitions. Finally, the method includes providing a graphical output from the computer system, based on the stored representation, in which appears a display of the subsystems in a hierarchy of partitions within a dependency structure matrix, such matrix graphically indicating the dependency relationships among subsystems. Related apparatus and computer products are also provided.
摘要:
Methods, systems, and computer program products are provided for generating application-aware data partitioning to support parallel computing. A label for a user defined data partitioning (UDP) key is generated by a labeling process to configure data partitions of original data. The UDP is labeled by the labeling process to include at least one key property excluded from the original data. The data partitions are evenly distributed to co-locate and balance the data partitions and corresponding computations performed by computational servers. A data record of the data partitions is retrieved by performing an all-node parallel search of the computational servers using the UDP key.
摘要:
A database management system provides the capability to perform cluster analysis and provides improved performance in model building and data mining, good integration with the various databases throughout the enterprise, and flexible specification and adjustment of the models being built, but which provides data mining functionality that is accessible to users having limited data mining expertise and which provides reductions in development times and costs for data mining projects. A database management system for in-database clustering comprises a first data table and a second data table, each data table including a plurality of rows of data, means for building a clustering model using the first data table using a portion of the first data table, wherein the portion of the first data table is selected by partitioning, density summarization, or active sampling of the first data table, and means for applying the clustering model using the second data table to generate apply output data.
摘要:
A method and system is provided to process data transactions in a data store including a plurality of databases. The system may comprise a computer interface module to receive a data transaction request from at least one requesting computer and a data store interface module to interface the system to the plurality of databases. The system also includes a data access layer defining an abstraction layer to identify at least one database of the plurality of databases. The data transaction request may be an object orientated request and the plurality of databases may be horizontally distributed wherein the data access layer defines an object orientated abstraction layer between the computer interface module and the plurality of databases. In one embodiment a data dependent routing module is provided that generates a query to a database that is identified based on content of the data in the data transaction request.