-
公开(公告)号:US11435910B2
公开(公告)日:2022-09-06
申请号:US16670715
申请日:2019-10-31
发明人: Mikhail Danilov , Yohannes Altaye
IPC分类号: G06F3/06
摘要: A mapped redundant array of independent nodes (mapped RAIN) for data storage is disclosed. A mapped RAIN cluster can be allocated on top of one or more real data clusters, wherein the real clusters can comprise storage devices of different storage capacities. Mapping of data storage locations for a mapped RAIN cluster to real storage devices can be based on an affinity value determined for pairs of real nodes of the real data clusters. A normalized affinity can be employed to enable allocation of real storage to mapped nodes of mapped clusters that can be based on the heterogeneous capacities of the storage devices. This can provide improved data availability and data recovery over other techniques where heterogeneity of hardware can make efficient resource allocation a non-trivial task. The disclosed subject matter can facilitate more efficient allocation of Mapped RAINs in a heterogeneous cluster storage construct.
-
72.
公开(公告)号:US11349501B2
公开(公告)日:2022-05-31
申请号:US16803920
申请日:2020-02-27
发明人: Mikhail Danilov , Yohannes Altaye
摘要: Multistep recovery of chunk fragments of a peer group employing hierarchical erasure coding for geographically diverse data storage protection is disclosed. A peer group of chunks can employ zone-level erasure coding of chunks that can each employ chunk-level erasure coding. In a first iteration, fragment recovery can be performed across peer group chunks based on the zone-level erasure coding. Subsequently, the first iteration can perform recovery of other fragments within a chunk based on the chunk-level erasure coding. Where additional fragments are to be recovered, subsequent iterations can be performed. The disclosed multistep recovery can enable recovery of fragments that would typically have been considered unrecoverable via conventional techniques. Additionally, multistep recovery can enable recovery of fragments across a peer group of chunks that can be more computing resource efficient than recovery of chunks across the peer group of chunks.
-
公开(公告)号:US11340834B2
公开(公告)日:2022-05-24
申请号:US16881556
申请日:2020-05-22
发明人: Mikhail Danilov , Yohannes Altaye
IPC分类号: G06F3/06
摘要: Improved scaling of an ordered event stream (OES) is disclosed. In contrast to conventional scaling of an OES that, in immediate response to loading exceeding a given processor performance level, merely divides a segment into segments having similar key space size, and then determines an alternate OES topology. The alternate OES topology can be selected from among ranked alternate OES topologies. The alternate OES topology can be implemented where the expected performance will meet a threshold level of improvement over an existing OES topology. Moreover, the alternate OES topology of the disclosed subject matter can comprise two or more two new segments that can have dissimilar key space sizes. Additionally, the two or more two new segments of the alternate OES topology can provide the same, or similar, loading relative to performance levels of corresponding processing instances, even where the performance levels of corresponding processing instances are also dissimilar.
-
公开(公告)号:US20220066877A1
公开(公告)日:2022-03-03
申请号:US17008709
申请日:2020-09-01
发明人: Mikhail Danilov , Yohannes Altaye
摘要: The disclosed technology is generally directed towards selecting storage devices, based on predicted reliability, for storing erasure coded data fragments and coding fragments. In general, to increase data availability, data fragments, such as for storing erasure coded immutable data, are stored to more reliable storage devices, while coding fragments are stored to less reliable storage devices. For example, solid state drives (SSDs) tend to fail based on the total number of writes they receive over time, whereby the total number of writes can be used to determine predicted reliability data for an SSD. Before writing the data and coding fragments to a number of storage devices, the storage devices can be sorted based on their predicted reliability such that the data fragments are written to (likely) more reliable devices and coding fragments to less likely storage devices.
-
公开(公告)号:US11194638B1
公开(公告)日:2021-12-07
申请号:US17200652
申请日:2021-03-12
发明人: Mikhail Danilov , Yohannes Altaye
IPC分类号: G06F9/54
摘要: Deferred scaling of an ordered event stream (OES) is disclosed. In contrast to conventional scaling of an OES, the disclosed deferred scaling can defer a scaling event where an impediment/condition to committing the scaling event is determined. This can comprise storing information corresponding to the scaling event as a virtual scaling event. The virtual scaling vent in some embodiments can be converted to an implemented scaling event at a later time. In a further embodiment, the virtual scaling event can be abandoned and the OES can continue to operate according to a last committed OES topology. In other embodiments, the virtual scaling event can be employed in determining a subsequent scaling event. Optionally, the subsequent scaling event can be an implemented scaling event or another deferred scaling event.
-
公开(公告)号:US20210365211A1
公开(公告)日:2021-11-25
申请号:US16881556
申请日:2020-05-22
发明人: Mikhail Danilov , Yohannes Altaye
IPC分类号: G06F3/06
摘要: Improved scaling of an ordered event stream (OES) is disclosed. In contrast to conventional scaling of an OES that, in immediate response to loading exceeding a given processor performance level, merely divides a segment into segments having similar key space size, and then determines an alternate OES topology. The alternate OES topology can be selected from among ranked alternate OES topologies. The alternate OES topology can be implemented where the expected performance will meet a threshold level of improvement over an existing OES topology. Moreover, the alternate OES topology of the disclosed subject matter can comprise two or more two new segments that can have dissimilar key space sizes. Additionally, the two or more two new segments of the alternate OES topology can provide the same, or similar, loading relative to performance levels of corresponding processing instances, even where the performance levels of corresponding processing instances are also dissimilar.
-
77.
公开(公告)号:US11144220B2
公开(公告)日:2021-10-12
申请号:US16726428
申请日:2019-12-24
发明人: Mikhail Danilov , Yohannes Altaye
摘要: Affinity sensitive storage of data corresponding to a doubly mapped redundant array of independent nodes, e.g., a doubly mapped cluster, in a real storage system, e.g., a real cluster, is disclosed. Different mappings of data to a doubly mapped cluster corresponding to real cluster storage locations can result in different levels of affinity between real disks and/or real nodes of the real cluster. A data storage scheme can be selected based on disk affinity scores and node affinity scores to provide access to stored data that can be more resilient against a real disk and/or a real node becoming less accessible. Further, data recovery from a real disk/node that has become less accessible can be improved where data is stored based on the disclosed disk affinity scores and/or node affinity scores.
-
公开(公告)号:US11112978B2
公开(公告)日:2021-09-07
申请号:US16781316
申请日:2020-02-04
发明人: Mikhail Danilov , Yohannes Altaye
摘要: The described technology is generally directed towards obtaining data, such as corresponding to a read request, from a geographic zone which may not be the zone that owns the data. When a request for data (e.g., a data segment) is received by a zone that does not own the requested data, the zone evaluates statistical data to determine whether it is more efficient to obtain the requested data directly from the zone that owns the data, or indirectly from one or more zones that contain related data from which the requested data can be reconstructed. If the indirect route is deemed sufficiently more efficient, the reconstruction data (e.g., counterpart segments) are obtained, and processed into the requested data, e.g., by XOR-ing the counterpart data segments into the requested data segment for returning to the client.
-
公开(公告)号:US20210271645A1
公开(公告)日:2021-09-02
申请号:US16803913
申请日:2020-02-27
发明人: Mikhail Danilov , Yohannes Altaye
IPC分类号: G06F16/174 , G06F16/17 , G06F16/16 , G06F16/182
摘要: Log-based storage space management related to data convolution in a geographically diverse data storage system is disclosed. Data chunks stored in storage devices of different zones of a zone storage system can be convolved to conserve computing resources. Deletion of a chunk from a first zone can be coupled to generating another chunk in another zone to preserve the integrity of a redundant data protection scheme. In response to determining that a first chunk is to be deleted, a log can be generated that can indicate the first chunk is available to be deleted and can indicate other affected chunks. In an aspect, the other affected chunks can comprise a convolved chunk that can convolve the first chunk and at least a second chunk. Accordingly a third chunk can be generated to facilitate deletion of the first chunk while preserving protection of information in the second chunk. Generation of the third chunk can be deferred until a threshold condition is determined to be satisfied.
-
80.
公开(公告)号:US20210191633A1
公开(公告)日:2021-06-24
申请号:US16726428
申请日:2019-12-24
发明人: Mikhail Danilov , Yohannes Altaye
IPC分类号: G06F3/06
摘要: Affinity sensitive storage of data corresponding to a doubly mapped redundant array of independent nodes, e.g., a doubly mapped cluster, in a real storage system, e.g., a real cluster, is disclosed. Different mappings of data to a doubly mapped cluster corresponding to real cluster storage locations can result in different levels of affinity between real disks and/or real nodes of the real cluster. A data storage scheme can be selected based on disk affinity scores and node affinity scores to provide access to stored data that can be more resilient against a real disk and/or a real node becoming less accessible. Further, data recovery from a real disk/node that has become less accessible can be improved where data is stored based on the disclosed disk affinity scores and/or node affinity scores.
-
-
-
-
-
-
-
-
-