摘要:
A data storage system is provided in which a migration plan generates migration plans for reaching a goal configuration from an initial configuration in a predetermined period of time. The migration plan initially makes terminal moves until no further terminal moves can be made. A shunt move is then made based on reducing the contention (the total size of the data stores that need to be moved onto a particular data stores device divided by the amount of excess capacity on the data storage device). The shunt is selected to lead to the lowest contention for the data storage system. Subsequently, the migration plan returns to making terminal moves to develop a migration plan. By determining existence and utilization dependencies of the various moves, independent moves are determined to be implemented in parallel with moves having dependencies. This produces parallelized migration plans which run much faster than sequential migration plans.
摘要:
A data storage system is provided in which a migration plan generates migration plans for reaching a goal configuration from an initial configuration in a predetermined period of time. The migration plan initially makes terminal moves until no further terminal moves can be made. A shunt move is then made based on reducing the contention (the total size of the data stores that need to be moved onto a particular data stores device divided by the amount of excess capacity on the data storage device). The shunt is selected to lead to the lowest contention for the data storage system. Subsequently, the migration plan returns to making terminal moves to develop a migration plan. Further migration plans are provided.
摘要:
Improved storage systems which use write off-loading are described. When a request to store some data in a particular storage location is received, if the particular storage location is unavailable, the data is stored in an alternative location. In an embodiment, the particular storage location may be unavailable because it is powered down or because it is overloaded. The data stored in the alternative location may be subsequently recovered and written to the particular storage location once it becomes available.
摘要:
Methods of querying a large number of endsystems are described in which metadata is replicated between endsystems. When a query is injected, an available endsystem receives a message relating to the query which identifies a range of endsystems for which that available endsystem is responsible. The available endsystem then generates completeness data for the range of endsystems based on data stored at the endsystem and this completeness data is transmitted to the sender of the message. The methods may be implemented using device-executable instructions which may be stored on device readable media.
摘要:
Resource optimization for online services is described. In one example, objects (such as mailboxes or other data associated with an online service) are assigned to network elements (such as servers) by inferring a relationship graph from log data relating to usage of the online service. The graph has a node for each object, and connections between each pair of objects having data items in common. Each connection has a weight relating to the number of common data items. The graph is partitioned into a set of clusters, such that each cluster has nodes joined by connections with a high weight relative to the weight of connections between nodes in different clusters. The objects are then distributed to the network elements such that objects corresponding to nodes in the same cluster are located on the same network element.
摘要:
A prediction system may perform capacity planning for one or more resources of a database systems, such as by understanding how different workloads are using the system resources and/or predicting how the performance of the workloads will change when the hardware configuration of the resource is changed and/or when the workload changes. The prediction system may use a detailed, low-level tracing of a live database system running an application workload to monitor the performance of the current database system. In this manner, the current monitoring traces and analysis may be combined with a simulation to predict the workload's performance on a different hardware configuration. More specifically, performance may be indicated as throughput and/or latency, which may be for all transactions, for a particular transaction type, and/or for an individual transaction. Database system performance prediction may include instrumentation and tracing, demand trace extraction, cache simulation, disk scaling, CPU scaling, background activity prediction, throughput analysis, latency analysis, visualization, optimization, and the like.
摘要:
Resource optimization for online services is described. In one example, objects (such as mailboxes or other data associated with an online service) are assigned to network elements (such as servers) by inferring a relationship graph from log data relating to usage of the online service. The graph has a node for each object, and connections between each pair of objects having data items in common. Each connection has a weight relating to the number of common data items. The graph is partitioned into a set of clusters, such that each cluster has nodes joined by connections with a high weight relative to the weight of connections between nodes in different clusters. The objects are then distributed to the network elements such that objects corresponding to nodes in the same cluster are located on the same network element.
摘要:
Methods of generating filters automatically from data processing jobs are described. In an embodiment, these filters are automatically generated from a compiled version of the data processing job using static analysis which is applied to a high-level representation of the job. The executable filter is arranged to suppress rows and/or columns within the data to which the job is applied and which do not affect the output of the job. The filters are generated by a filter generator and then stored and applied dynamically at a filtering proxy that may be co-located with the storage node that holds the data. In another embodiment, the filtered data may be cached close to a compute node which runs the job and data may be provided to the compute node from the local cache rather than from the filtering proxy.
摘要:
Methods of querying a large number of endsystems are described in which metadata is replicated between endsystems. When a query is injected, an available endsystem receives a message relating to the query which identifies a range of endsystems for which that available endsystem is responsible. The available endsystem then generates completeness data for the range of endsystems based on data stored at the endsystem and this completeness data is transmitted to the sender of the message. The methods may be implemented using device-executable instructions which may be stored on device readable media.
摘要:
Improved storage systems which use write off-loading are described. When a request to store some data in a particular storage location is received, if the particular storage location is unavailable, the data is stored in an alternative location. In an embodiment, the particular storage location may be unavailable because it is powered down or because it is overloaded. The data stored in the alternative location may be subsequently recovered and written to the particular storage location once it becomes available.