Abstract:
A method, non-transitory computer readable medium, and device that prefetchs includes identifying a candidate data block from one of one or more immediate successor data blocks. The identified candidate data block has a historical access probability value from an initial accessed data block which is higher than a historical access probability value for each of the other immediate successor data blocks and is above a prefetch threshold value. The identifying is repeated until a next identified candidate data block has the historical access probability value below the prefetch threshold value. In the repeating, the identifying next immediate successor data blocks is from the previously identified candidate data block and the historical access probability value for each of the next immediate successor data blocks is determined from the originally accessed data block. The identified candidate data block with the historical access probability value above the prefetch threshold value is fetched.
Abstract:
A method, non-transitory computer readable medium, and device that prefetchs includes identifying a candidate data block from one of one or more immediate successor data blocks. The identified candidate data block has a historical access probability value from an initial accessed data block which is higher than a historical access probability value for each of the other immediate successor data blocks and is above a prefetch threshold value. The identifying is repeated until a next identified candidate data block has the historical access probability value below the prefetch threshold value. In the repeating, the identifying next immediate successor data blocks is from the previously identified candidate data block and the historical access probability value for each of the next immediate successor data blocks is determined from the originally accessed data block. The identified candidate data block with the historical access probability value above the prefetch threshold value is fetched.
Abstract:
Systems, devices, and methods are described for performing content-aware task assignment. A resource manager in a distributed computing system can identify tasks associated with a file. Each task can involve processing multiple data blocks of the file (e.g., in parallel with other processing by other tasks). The resource manager can provide block identifiers for the blocks to each of multiple computing nodes. Each computing node can store a respective subset of the blocks in a respective cache storage medium. Each subset of blocks stored at a node can be identified from the block identifiers. The resource manager can assign the task to a selected one of the computing nodes. The task can be assigned based on the selected computing node having larger subset of the blocks than one or more other computing nodes in the distributed computing system. In some embodiments, computing nodes can de-duplicate cached data using block identifiers.
Abstract:
A method, non-transitory computer readable medium, and system node computing device that generates a snapshot identifier and returns the snapshot identifier in response to a received request to create a snapshot of a No SQL database. When an entry in a transaction table has a first transaction value corresponding to a transaction that has been committed and a second transaction value that is not assigned or corresponds to another transaction that has not been committed, is determined. The snapshot identifier is inserted into the entry when the entry is determined to have the first transaction value corresponding to the transaction that has been committed and the second transaction value that is not assigned or corresponds to the another transaction that has not been committed.
Abstract:
A method, non-transitory computer readable medium, and system node computing device that facilitate a NoSQL datastore with integrated management. In some embodiments, this technology provides a fast, highly available, and application integrated NoSQL database that can be established in a data storage network such that various data management policies are automatically implemented. This technology enables application administrators to more effectively leverage NoSQL databases by storing data in tables located on storage nodes in groups and zones that have associated SLCs, as previously established upon creation of the tables or an associated entity group or database. Accordingly, management of the data is relatively integrated and data tiering can be more efficiently implemented. This technology also provides a highly scalable infrastructure that can add capacity having predictable and established service levels dynamically and that optimizes the storage of data on types of media having different characteristics in order to provide cost-effective storage.