Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index is partitioned into subspace indexes, with less than the entire hash index service's index cached to save memory. The subspace index is accessed to determine whether a data chunk already exists or needs to be indexed and stored. The index may be divided into subspaces based on criteria associated with the data to index, such as file type, data type, time of last usage, and so on. Also described is subspace reconciliation, in which duplicate entries in subspaces are detected so as to remove entries and chunks from the deduplication system. Subspace reconciliation may be performed at off-peak time, when more system resources are available, and may be interrupted if resources are needed. Subspaces to reconcile may be based on similarity, including via similarity of signatures that each compactly represents the subspace's hashes.
Abstract:
Techniques for data synchronization policies are described. In one or more implementations, techniques may be employed to set data synchronization (“sync”) policies for devices in a data sync environment. The sync policies specify parameters for sync operations in the sync environment, such as how frequently data sync operations are performed, what types of data are synced to particular devices, how frequently particular types of data are synced, and so on. In implementations, the sync policies consider the number of devices that are participating in a sync environment and attributes of the devices in specifying parameters for sync operations. Data can be synchronized among devices in the sync environment based on the sync policies.
Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency.
Abstract:
Techniques for data synchronization policies are described. In one or more implementations, techniques may be employed to set data synchronization (“sync”) policies for devices in a data sync environment. The sync policies specify parameters for sync operations in the sync environment, such as how frequently data sync operations are performed, what types of data are synced to particular devices, how frequently particular types of data are synced, and so on. In implementations, the sync policies consider the number of devices that are participating in a sync environment and attributes of the devices in specifying parameters for sync operations. Data can be synchronized among devices in the sync environment based on the sync policies.
Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index is partitioned into subspace indexes, with less than the entire hash index service's index cached to save memory. The subspace index is accessed to determine whether a data chunk already exists or needs to be indexed and stored. The index may be divided into subspaces based on criteria associated with the data to index, such as file type, data type, time of last usage, and so on. Also described is subspace reconciliation, in which duplicate entries in subspaces are detected so as to remove entries and chunks from the deduplication system. Subspace reconciliation may be performed at off-peak time, when more system resources are available, and may be interrupted if resources are needed. Subspaces to reconcile may be based on similarity, including via similarity of signatures that each compactly represents the subspace's hashes.
Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency.
Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency.
Abstract translation:主题公开涉及一种数据重复数据删除技术,其中散列索引服务的索引在诸如硬盘驱动器的辅助存储设备中维护散列索引,以及RAM中的紧凑索引表和预先高速缓存,其操作以减少 I / O在重复数据消除操作期间访问辅助存储设备。 还描述了用于在重复数据删除会话期间维护数据的会话高速缓存,以及用于效率的只读压缩索引表的编码。
Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency.
Abstract translation:主题公开涉及一种数据重复数据删除技术,其中散列索引服务的索引在诸如硬盘驱动器的辅助存储设备中维护散列索引,以及RAM中的紧凑索引表和预先高速缓存,其操作以减少 I / O在重复数据消除操作期间访问辅助存储设备。 还描述了用于在重复数据删除会话期间维护数据的会话高速缓存,以及用于效率的只读压缩索引表的编码。