Distributed data deduplication in a grid of processors

    公开(公告)号:US10255288B2

    公开(公告)日:2019-04-09

    申请号:US14993225

    申请日:2016-01-12

    IPC分类号: G06F17/30

    摘要: Embodiments for distributed data deduplication in a grid of processors. Input data is received on a processor. The input data is partitioned into a plurality of similarity units. A corresponding deduplication metadata slice and owning processor for one of the similarity units is calculated. A representative value and corresponding digest values of the similarity unit are sent to the owning processor. The owning processor is used to search for the representative value in the deduplication metadata slice, and to send a specification and owning processors of calculated identical data sections to the processor. The processor is used to send nominal information of the calculated identical data sections to the owning processors of the data referenced by the calculated identical data sections.

    Storing data deduplication metadata in a grid of processors

    公开(公告)号:US10242021B2

    公开(公告)日:2019-03-26

    申请号:US14993211

    申请日:2016-01-12

    IPC分类号: G06F17/30 H04L9/06

    摘要: Embodiments for storing data deduplication metadata in a grid of processors. Each of a plurality of slices of deduplication metadata is assigned to be stored by a corresponding processor in a grid of processors. Each slice of the plurality of slices includes at least one of a slice of a similarity index and groups of digests corresponding to those of a plurality of representative values in the slice of the similarity index. A hashing method is use to map between a plurality of input representative values and the plurality of slices of deduplication metadata.

    DISTRIBUTED DATA DEDUPLICATION IN A GRID OF PROCESSORS

    公开(公告)号:US20170199891A1

    公开(公告)日:2017-07-13

    申请号:US14993225

    申请日:2016-01-12

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30156

    摘要: Embodiments for distributed data deduplication in a grid of processors. Input data is received on a processor. The input data is partitioned into a plurality of similarity units. A corresponding deduplication metadata slice and owning processor for one of the similarity units is calculated. A representative value and corresponding digest values of the similarity unit are sent to the owning processor. The owning processor is used to search for the representative value in the deduplication metadata slice, and to send a specification and owning processors of calculated identical data sections to the processor. The processor is used to send nominal information of the calculated identical data sections to the owning processors of the data referenced by the calculated identical data sections.

    Hierarchical management of storage capacity and data volumes in a converged system

    公开(公告)号:US10824355B2

    公开(公告)日:2020-11-03

    申请号:US15403069

    申请日:2017-01-10

    IPC分类号: G06F3/06

    摘要: A computer-implemented method according to one embodiment includes identifying a plurality of storage resources. Additionally, the method includes creating a storage capacity, where the storage capacity has a first plurality of associated attributes. Further, the method includes defining one or more data volumes for the storage capacity, where each of the one or more data volumes has a second plurality of associated attributes and inherits the first plurality of associated attributes. Further still, the method includes configuring one or more volume shares for each data volume, where each of the volume shares has a third plurality of associated attributes and inherits the first plurality of associated attributes as well as the second plurality of associated attributes.

    Rebalancing distributed metadata
    5.
    发明授权

    公开(公告)号:US10261946B2

    公开(公告)日:2019-04-16

    申请号:US14993220

    申请日:2016-01-12

    摘要: Embodiments for rebalancing distributed deduplication metadata by a processor. An input similarity unit of data and a corresponding input representative value are received on an ingesting processor. A corresponding deduplication metadata slice and owning processor are calculated for the input similarity unit. The input representative value and input digest values are sent to the owning processor. The owning processor is used to search for the input representative value in a corresponding deduplication metadata slice, and to forward the input representative value and input digest values to an additional processor, if the input representative value is not found by the owning processor and a rebalancing status of the owning processor is in-process. The additional processor is used to send a reply message to the owning processor that facilitates migration of the input representative value and corresponding input digest values to the owning processor, if the input representative value is found.

    Automatic diagonal scaling of workloads in a distributed computing environment

    公开(公告)号:US10812407B2

    公开(公告)日:2020-10-20

    申请号:US15819225

    申请日:2017-11-21

    摘要: Embodiments for automatic diagonal scaling of workloads in a distributed computing environment. For each of a plurality of resources of each of a plurality of application instances, a determination as to whether a change in allocation of at least one of the plurality of resources is required. Operations requirements are computed for each of the plurality of application instances, the computed requirements including vertical increase and decrease operations, and horizontal split and collapse operations. The vertical decrease and horizontal collapse operations are first processed, the vertical increase and horizontal split operations are ordered, and the vertical increase and horizontal split operations are subsequently processed based on the ordering, thereby optimizing application efficiency and utilization of the plurality of resources in the distributed computing environment.

    Workload management with data access awareness by aggregating file locality information in a computing cluster

    公开(公告)号:US10761891B2

    公开(公告)日:2020-09-01

    申请号:US15945921

    申请日:2018-04-05

    IPC分类号: G06F9/50 G06F9/48 G06F16/14

    摘要: Embodiments for workload management by aggregating locality information for a set of files in a cluster of hosts, from a file level to a level of the set of files in a cluster of hosts. To facilitate workload scheduling in the cluster, a subset of the set of files is selected. A set of storage size counters, each assigned to a host in the cluster, is reset. An overall storage size counter is reset, and the files in the subset of the set of files are scanned. For each scanned file, locality information of the file is retrieved and added to the storage size counters of the hosts, and a total size of the file is added to the overall storage size counter. An output proportion of the storage size counter of each host is then computed from the overall storage size counter.

    HIERARCHICAL MANAGEMENT OF STORAGE CAPACITY AND DATA VOLUMES IN A CONVERGED SYSTEM

    公开(公告)号:US20180196608A1

    公开(公告)日:2018-07-12

    申请号:US15403069

    申请日:2017-01-10

    IPC分类号: G06F3/06

    摘要: A computer-implemented method according to one embodiment includes identifying a plurality of storage resources. Additionally, the method includes creating a storage capacity, where the storage capacity has a first plurality of associated attributes. Further, the method includes defining one or more data volumes for the storage capacity, where each of the one or more data volumes has a second plurality of associated attributes and inherits the first plurality of associated attributes. Further still, the method includes configuring one or more volume shares for each data volume, where each of the volume shares has a third plurality of associated attributes and inherits the first plurality of associated attributes as well as the second plurality of associated attributes.