Abstract:
A method for managing data items retrieved for storage in a prefetch memory buffer includes determining a probability that a first data item will be requested for retrieval. The method includes estimating a first request time at which the new data item will be requested. The method also includes determining a time differential for the first data item, wherein the time differential is determined based on current time and the first request time. The method includes calculating a first prefetch priority value for the first data item based on the first data item probability and the time differential. The method includes randomly comparing the first prefetch priority value of the first data item to the prefetch priority values of the one or more stored data items to identify at least one stored data item having a prefetch priority value lower than the first prefetch priority value.
Abstract:
Systems and methods are discussed relating to allocation of memory from a fixed pool of fast memory within a data center having a data storage area equipped with that memory. Techniques include: receiving a request to write data in the storage area; identifying a file group associated with the write request; analyzing previous data activity traces associated with the file group; determining an available fast memory amount based on the total amount of fast memory in the fixed pool and a currently allocated amount of fast memory; determining a fast memory allocation for the file group based on the previous data activity traces, the available fast memory, and a fast memory constraint, the memory allocation including an allocation amount and a write probability; and providing information about the memory allocation to a file system of the data center, which writes the data based on the allocation amount and write probability.
Abstract:
Systems, mediums, and methods are provided for scheduling input/output requests to a storage system. The input output requests may be received, categorized based on their priority, and scheduled for retrieval from the storage system. Lower priority requests may be divided into smaller sub-requests, and the sub-requests may be scheduled for retrieval only when there are no pending higher priority requests, and/or when higher priority requests are not predicted to arrive for a certain period of time. By servicing the small sub-requests rather than the entire lower priority request, the retrieval of the lower priority request may be paused in the event that a high priority request arrives while the lower priority request is being serviced.
Abstract:
A method includes receiving trace data representing access information about files stored in a large-scale distributed storage system, identifying file access patterns based on the trace data, receiving metadata information associated with the files stored in the large-scale distributed storage system, and generating a preferred storage parameter for each file based on the received metadata information and the identified file access patterns. The method also includes receiving, file reliability or accessibility information of a new file, determining whether the received file reliability or accessibility information of the new file matches information of a file group of the files in the large-scale distributed storage system, and when the file reliability or accessibility information of the new file matches the information of the file group, storing the new file in the large-scale distributed storage system using the preferred storage parameter associated with the file group.
Abstract:
Methods to determine and automatically recommend or adjust configuration parameters for storing files in large-scale distributed storage systems are disclosed. These methods may receive file metadata and trace data that allows the system to identify file access patterns. Additionally, the methods may receive information about distributed storage systems in a datacenter. This information can be used to choose storage parameters on a per-file basis for storing files.
Abstract:
A method includes receiving trace data representing access information about files stored in a large-scale distributed storage system, identifying file access patterns based on the trace data, receiving metadata information associated with the files stored in the large-scale distributed storage system, and generating a preferred storage parameter for each file based on the received metadata information and the identified file access patterns. The method also includes receiving, file reliability or accessibility information of a new file, determining whether the received file reliability or accessibility information of the new file matches information of a file group of the files in the large-scale distributed storage system, and when the file reliability or accessibility information of the new file matches the information of the file group, storing the new file in the large-scale distributed storage system using the preferred storage parameter associated with the file group.