Processing device configured for efficient generation of compression estimates for datasets

    公开(公告)号:US11609883B2

    公开(公告)日:2023-03-21

    申请号:US15991380

    申请日:2018-05-29

    摘要: An apparatus in one embodiment comprises at least one processing device comprising a processor coupled to a memory. The processing device is configured to identify a dataset to be scanned to generate a compression estimate for that dataset, to designate a scan criterion to be utilized in the scan, and for each of a plurality of pages of the dataset, to scan the page, where scanning the page includes performing a computation on the page to obtain a page result, determining whether or not the page result satisfies the designated scan criterion, and responsive to the page result satisfying the designated scan criterion, updating a corresponding entry of a compression estimate table for the dataset. The processing device generates the compression estimate for the dataset based at least in part on contents of the compression estimate table. The scan criterion may comprise, for example, a designated content-based signature prefix, or a designated subset inclusion characteristic defining a polynomial-based signature subspace.

    Changing page size in an address-based data storage system

    公开(公告)号:US11429531B2

    公开(公告)日:2022-08-30

    申请号:US16779749

    申请日:2020-02-03

    摘要: Host I/O requests directed to a logical storage volume are initially processed by accessing physical pages of non-volatile data storage having a default page size. An indication of an optimal page size for the logical storage volume is received, and the size of the physical pages of non-volatile data storage accessed to process host I/O requests directed to the logical storage volume is changed from the default page size to the optimal page size for the logical storage volume. The default page size is changed to the optimal page size for the logical storage volume by changing a size of physical pages of non-volatile data storage indicated by a mapping structure that maps logical addresses in an address space the logical storage volume to corresponding physical pages of non-volatile data storage from the default page size to the optimal page size for the logical storage volume.

    TECHNIQUES FOR WORKLOAD BALANCING USING DYNAMIC PATH STATE MODIFICATIONS

    公开(公告)号:US20220206871A1

    公开(公告)日:2022-06-30

    申请号:US17137988

    申请日:2020-12-30

    IPC分类号: G06F9/50 G06F3/06

    摘要: Rebalancing the workload of logical devices across multiple nodes may include dynamically modifying preferred paths for one or more logical devices in order to rebalance the I/O workload of the logical devices among the nodes of the data storage system. Determining whether to rebalance the I/O workload between the two nodes may be performed in accordance with one or more criteria. Processing may include monitoring the current workloads of both nodes over time and periodically evaluating, in accordance with the one or more criteria, whether the current workloads of the nodes are imbalanced. Responsive to determining, in accordance with the criteria, that rebalancing of workload between the nodes is needed, the rebalancing may be performed. A notification may be sent to the host regarding any path state changes made as a result of the workload rebalancing.

    Cascading snapshot creation in a native replication 3-site configuration

    公开(公告)号:US11360688B2

    公开(公告)日:2022-06-14

    申请号:US15971153

    申请日:2018-05-04

    IPC分类号: G06F3/06 G06F11/14 G06F9/455

    摘要: In one aspect a data replication process in a storage system includes creating, at a first target site, an empty container in a storage system. The empty container matches a container at a source site in response to initiation of an asynchronous data replication process. An aspect also includes transmitting a command to a second target site to create a container at the second target site. The first target site performs the asynchronous data replication process, which includes scanning the data upon receipt from the source site for a first target replication cycle and transmitting the scanned data to the container at the second target site for a second target replication cycle.

    Dynamic balancing of input/output (IO) operations for a storage system

    公开(公告)号:US11301138B2

    公开(公告)日:2022-04-12

    申请号:US16516670

    申请日:2019-07-19

    IPC分类号: G06F3/06

    摘要: In one aspect, performing dynamic balancing of input/output (IO) operations includes providing a first queue for a first storage unit and a second queue for a second storage unit. The queues are configured to receive IO requests directed to the storage units. An aspect also includes determining a quality of service (QoS) value assigned to each of the storage units, pulling entries from the queues at a rate that accords with the QoS value, executing IOs, and monitoring bandwidth of the IO operations. Upon determining the bandwidth is not in alignment with the QoS value for either of the first and second storage units, a further aspect includes modifying the rate in which entries are pulled from at least one of the queues, continuing the monitoring the bandwidth and the modifying the rate until the bandwidth aligns with the QoS value assigned to each of the storage units.

    Techniques for data migration
    6.
    发明授权

    公开(公告)号:US11281390B2

    公开(公告)日:2022-03-22

    申请号:US16818173

    申请日:2020-03-13

    IPC分类号: G06F3/06

    摘要: Techniques for data migration may include: copying data of a source logical device of a source system to a target logical device of a target system; during said copying, receiving at the target system an I/O operation directed to a logical address of the target logical device and intercepting the I/O operation on the target system; determining, on the target system, to request from the source system a data page stored at the logical address; responsive to determining to request the data page stored, performing processing including: issuing a request to the source system for the data page stored at the logical address; and responsive to receiving said request, sending information from the source system to the target system, wherein the information includes the data page stored at the logical address and additional logical addresses of the source logical device at which the data page is stored.

    Direct input/output path to compressed data

    公开(公告)号:US11269776B2

    公开(公告)日:2022-03-08

    申请号:US16656222

    申请日:2019-10-17

    摘要: Techniques for providing a direct IO path to compressed data on storage media of a storage system. The techniques include triggering a transaction cache to perform a flush operation for updating mapping metadata for a storage object containing the compressed data. Having updated the mapping metadata for the storage object, the techniques further include issuing, by a copier module, an IO read request for the compressed data of the storage object to a namespace layer, which issues the IO read request to a mapping layer. The techniques further include forwarding the IO read request to a logical layer of the mapping layer, bypassing the transaction cache. The techniques further include reading, by the logical layer, the compressed data of the storage object from the storage media, and providing, via the mapping layer and the namespace layer, the compressed data to the copier module for transfer to a destination storage system.

    Provenance-based replication in a storage system

    公开(公告)号:US11238063B2

    公开(公告)日:2022-02-01

    申请号:US16521728

    申请日:2019-07-25

    IPC分类号: G06F16/27 G06F16/23

    摘要: In one aspect, provenance-based replication includes assigning a GUID to a first snap tree of a first storage array and another GUID to a second snap tree of a second storage array. The trees are peers of each other with respect to at least one volume replicated between the arrays. For each volume in the first array that is replicated to a volume in the second array, an aspect includes assigning a volume pairing identifier common to both volumes. Upon determining data for a volume (V1) in the first array has been lost/corrupted, an aspect includes identifying the peer tree from the GUID and using the pairing ID of V1 to search the peer tree for a volume (V2) in the second array, retrieving data for V2, computing a delta between the data of V1 and the data of V2, and reconstructing the lost/corrupted data for V1 using the delta.

    Caching techniques for migrating and replicating data

    公开(公告)号:US11237964B2

    公开(公告)日:2022-02-01

    申请号:US16398427

    申请日:2019-04-30

    摘要: Techniques for processing data include: receiving a hierarchical structure of metadata (MD) pages for a logical device; and performing processing to copy data of the logical device from a source system to a target system. The first processing includes: determining a sequence of the MD pages in accordance with a depth first traversal of the hierarchical structure; defining a cache management policy in accordance with the sequence that indicates when to load the MD pages into a cache and when to remove the MD pages from the cache; loading MD pages into, and removing MD pages from, the cache in accordance with the cache management policy; and copying data pages stored at logical addresses of the logical device in an order in which the logical addresses are accessed using MD pages stored in the cache at various points in time in accordance with the cache management policy.

    PERFORMANCE OF REPLICATION SYSTEM WITH SMALL NUMBER OF IP LINKS AVAILABLE

    公开(公告)号:US20210278970A1

    公开(公告)日:2021-09-09

    申请号:US16811000

    申请日:2020-03-06

    IPC分类号: G06F3/06

    摘要: A method is provided for use in a storage system, comprising: identifying a first process that is arranged to execute a first type-1 node and a first type-2 node of the storage system, the first type-1 node being assigned a communication link for transmitting replication data to a target system, the first type-2 node being arranged to execute I/O requests associated with a first set of addresses in an address space; identifying a second process that is arranged to execute a second type-1 node and a second type-2 node of the storage system, the second type-1 node being not being assigned any communication link for transmitting replication data to a target system, the second type-2 node being arranged to execute I/O requests associated with a second set of addresses in the address space; and transferring at least one of the addresses in the first set to the second set.