-
公开(公告)号:US08380681B2
公开(公告)日:2013-02-19
申请号:US12970839
申请日:2010-12-16
IPC分类号: G06F17/00
CPC分类号: G06F17/30091 , G06F17/3007
摘要: The subject disclosure is directed towards data deduplication (optimization) performed by phases/modules of a modular data deduplication pipeline. At each phase, the pipeline allows modules to be replaced, selected or extended, e.g., different algorithms can be used for chunking or compression based upon the type of data being processed. The pipeline facilitates secure data processing, batch processing, and parallel processing. The pipeline is tunable based upon feedback, e.g., by selecting modules to increase deduplication quality, performance and/or throughput. Also described is selecting, filtering, ranking, sorting and/or grouping the files to deduplicate, e.g., based upon properties and/or statistical properties of the files and/or a file dataset and/or internal or external feedback.
摘要翻译: 主题公开针对由模块化重复数据消除管道的阶段/模块执行的重复数据删除(优化)。 在每个阶段,流水线允许模块被替换,选择或扩展,例如,可以根据所处理的数据类型将不同的算法用于分组或压缩。 该管道有助于安全数据处理,批量处理和并行处理。 基于反馈可以调整流水线,例如通过选择模块来增加重复数据删除的质量,性能和/或吞吐量。 还描述的是,例如基于文件和/或文件数据集和/或内部或外部反馈的属性和/或统计属性来选择,过滤,排序和/或分组文件以进行重复数据删除。
-
公开(公告)号:US20120158672A1
公开(公告)日:2012-06-21
申请号:US12970839
申请日:2010-12-16
IPC分类号: G06F17/30
CPC分类号: G06F17/30091 , G06F17/3007
摘要: The subject disclosure is directed towards data deduplication (optimization) performed by phases/modules of a modular data deduplication pipeline. At each phase, the pipeline allows modules to be replaced, selected or extended, e.g., different algorithms can be used for chunking or compression based upon the type of data being processed. The pipeline facilitates secure data processing, batch processing, and parallel processing. The pipeline is tunable based upon feedback, e.g., by selecting modules to increase deduplication quality, performance and/or throughput. Also described is selecting, filtering, ranking, sorting and/or grouping the files to deduplicate, e.g., based upon properties and/or statistical properties of the files and/or a file dataset and/or internal or external feedback.
摘要翻译: 主题公开针对由模块化重复数据消除管道的阶段/模块执行的重复数据删除(优化)。 在每个阶段,流水线允许模块被替换,选择或扩展,例如,可以根据所处理的数据类型将不同的算法用于分组或压缩。 该管道有助于安全数据处理,批量处理和并行处理。 基于反馈可以调整流水线,例如通过选择模块来增加重复数据删除的质量,性能和/或吞吐量。 还描述的是,例如基于文件和/或文件数据集和/或内部或外部反馈的属性和/或统计属性来选择,过滤,排序和/或分组文件以进行重复数据删除。
-