System and method for dividing data into predominantly fixed-sized chunks so that duplicate data chunks may be identified
    21.
    发明授权
    System and method for dividing data into predominantly fixed-sized chunks so that duplicate data chunks may be identified 有权
    将数据分割成主要是固定大小的块的系统和方法,以便可以识别重复的数据块

    公开(公告)号:US07281006B2

    公开(公告)日:2007-10-09

    申请号:US10693284

    申请日:2003-10-23

    IPC分类号: G06F17/30 G06F11/00

    摘要: A data chunking system divides data into predominantly fixed-sized chunks such that duplicate data may be identified. The data chunking system may be used to reduce the data storage and save network bandwidth by allowing storage or transmission of primarily unique data chunks. The system may also be used to increase reliability in data storage and network transmission, by allowing an error affecting a data chunk to be repaired with an identified duplicate chunk. The data chunking system chunks data by selecting a chunk of fixed size, then moving a window along the data until a match to existing data is found. As the window moves across the data, unique chunks predominantly of fixed size are formed in the data passed over. Several embodiments provide alternate methods of determining whether a selected chunk matches existing data and methods by which the window is moved through the data. To locate duplicate data, the data chunking system remembers data by computing a mathematical function of a data chunk and inserting the computed value into a hash table.

    摘要翻译: 数据分块系统将数据分成主要固定大小的块,以便可以识别重复数据。 数据分块系统可用于通过允许主要唯一数据块的存储或传输来减少数据存储并节省网络带宽。 也可以通过允许使用识别的重复块来修复影响数据块的错误来提高数据存储和网络传输的可靠性。 数据分块系统通过选择固定大小的块来块数据,然后沿着数据移动窗口直到找到与现有数据的匹配。 当窗口移动数据时,在传递的数据中形成主要是固定大小的独特块。 几个实施例提供了确定所选择的块是否匹配现有数据的替代方法,以及通过该窗口移动数据的方法。 为了定位重复数据,数据分块系统通过计算数据块的数学函数并将计算的值插入散列表来记住数据。

    System and method for providing a cost-adaptive cache
    22.
    发明授权
    System and method for providing a cost-adaptive cache 失效
    用于提供成本自适应高速缓存的系统和方法

    公开(公告)号:US07143240B2

    公开(公告)日:2006-11-28

    申请号:US10698897

    申请日:2003-10-31

    IPC分类号: G06F12/12

    摘要: A cost-adaptive cache including the ability to dynamically maximize performance in a caching system by preferentially caching data according to the cost of replacing data. The cost adaptive cache includes a partitioned real cache, wherein data is stored in each of the real cache partitions according to its replacement cost. Also, the cost-adaptive cache includes a partitioned phantom cache to provide a directory of information pertaining to blocks of data which do not qualify for inclusion in the real cache. The partitions in the phantom cache correspond to the partitions in the real cache. Moreover, the cost-adaptive cache maximizes performance in a system by preferentially caching data that is more costly to replace. In one embodiment of the system, the cost of replacing a block of data is estimated by the previous cost incurred to fetch that block of data.

    摘要翻译: 一种成本自适应缓存,包括通过根据替换数据的成本优先缓存数据来动态地最大化缓存系统中的性能的能力。 成本自适应高速缓存包括分区真实高速缓存,其中根据其重置成本将数据存储在每个真实高速缓存分区中。 此外,成本自适应高速缓存包括分区虚拟高速缓存,以提供关于不符合包含在真实高速缓存中的数据块的信息的目录。 幻像缓存中的分区对应于实际高速缓存中的分区。 此外,成本自适应缓存通过优先缓存替换成本更高的数据来最大化系统中的性能。 在系统的一个实施例中,替换数据块的成本通过获取该数据块所需的先前成本来估计。

    System and method for detecting and sharing common blocks in an object storage system
    23.
    发明授权
    System and method for detecting and sharing common blocks in an object storage system 失效
    用于在对象存储系统中检测和共享公共块的系统和方法

    公开(公告)号:US07076622B2

    公开(公告)日:2006-07-11

    申请号:US10674375

    申请日:2003-09-30

    IPC分类号: G06F12/00

    摘要: A system and method of optimizing storage of common data blocks within a networked storage system comprises receiving a data block to be stored in the networked storage system, analyzing contents of the received data block to determine how many copies of the data block the system is entrusted to store and how many copies of the data block existing within the system, and to identify a location of each copy of the data block within the system, identifying performance and reliability requirements of the system, determining an optimal number of copies of the received data block to store in the system, wherein the determination is made according to a number of copies of the data block the system is entrusted to store together with the identified performance and reliability requirements of the system, and maintaining the optimal number of copies of the received data block within the system.

    摘要翻译: 一种优化网络存储系统内公共数据块存储的系统和方法包括接收要存储在网络存储系统中的数据块,分析接收到的数据块的内容,以确定系统被委托的数据块的副本数 存储和存储系统内存在的数据块的副本,以及识别系统内的数据块的每个副本的位置,识别系统的性能和可靠性要求,确定接收到的数据的最佳拷贝数 块存储在系统中,其中根据系统被委托存储的数据块的副本数量与所识别的系统的性能和可靠性要求一起进行确定,并且保持所接收的最佳拷贝数 系统内的数据块。

    Storage system and method for reorganizing data to improve prefetch effectiveness and reduce seek distance
    24.
    发明授权
    Storage system and method for reorganizing data to improve prefetch effectiveness and reduce seek distance 失效
    用于重组数据的存储系统和方法,以提高预取有效性并减少查找距离

    公开(公告)号:US06963959B2

    公开(公告)日:2005-11-08

    申请号:US10286485

    申请日:2002-10-31

    IPC分类号: G06F3/06 G06F12/08 G06F12/00

    摘要: A data storage system and method for reorganizing data to improve the effectiveness of data prefetching and reduce the data seek distance. A data reorganization region is allocated in which data is reorganized to service future requests for data. Sequences of data units that have been repeatedly requested are determined from a request stream, preferably using a graph where each vertex of the graph represents a requested data unit and each edge represents that a destination unit is requested shortly after a source unit the frequency of this occurrence. The most frequently requested data units are also determined from the request stream. The determined data is copied into the reorganization region and reorganized according to the determined sequences and most frequently requested units. The reorganized data might then be used to service future requests for data.

    摘要翻译: 一种用于重组数据的数据存储系统和方法,以提高数据预取的有效性并减少数据寻找距离。 分配数据重组区域,其中重新组织数据以服务将来的数据请求。 已经重复请求的数据单元的序列是根据请求流确定的,优选地使用图形的每个顶点表示所请求的数据单元,并且每个边缘表示在源单元之后不久请求目标单元的频率 发生。 最常请求的数据单元也根据请求流确定。 确定的数据被复制到重组区域中,并根据所确定的序列和最常请求的单元进行重新组织。 然后可以将重组的数据用于为将来的数据请求提供服务。

    Method and apparatus to prefetch sequential pages in a multi-stream environment
    25.
    发明授权
    Method and apparatus to prefetch sequential pages in a multi-stream environment 有权
    在多流环境中预取顺序页面的方法和装置

    公开(公告)号:US06567894B1

    公开(公告)日:2003-05-20

    申请号:US09456539

    申请日:1999-12-08

    IPC分类号: G06F1200

    CPC分类号: G06F12/0862 G06F2212/6026

    摘要: The present invention is system and method for determining information that is to be prefetched in a multi-stream environment which can detect sequential streams from among the aggregate reference stream and yet requires relatively little memory to operate, which is uniquely adapted for use in a multi-stream environment, in which multiple data accessing streams are performing sequential accesses to information independently of each other. A reference address referencing stored information is received. A matching run is found. A count corresponding to the run is updated. If the count exceeds a predetermined threshold, an amount of information to prefetch is determined. If a predetermined fraction of the determined amount of information to prefetch must still be retrieved, the determined amount of information is retrieved. A matching run may be found by searching a stack comprising a plurality of entries to find an entry corresponding to the reference address. Each of the plurality of entries may be associated with a maximum accessed address, a forward range, and a backward range, and the searching step may comprise searching the plurality of stack entries in one direction starting at an end of the stack and determining whether the reference address is between (maximum accessed address−backward range) and (maximum accessed address+forward range) for each stack entry until a matching stack entry is found.

    摘要翻译: 本发明是用于确定在多流环境中预取的信息的系统和方法,其可以从聚合参考流中检测顺序流,并且需要相对较少的操作内存,其被独特地适用于多 流环境,其中多个数据访问流对彼此独立的信息执行顺序访问。 接收参考存储信息的参考地址。 找到匹配的运行。 对应于运行的计数更新。 如果计数超过预定阈值,则确定预取信息量。 如果仍然需要检索到预取信息的确定量的预定分数,则检索所确定的信息量。 可以通过搜索包括多个条目的堆栈来找到与参考地址相对应的条目来找到匹配运行。 多个条目中的每一个可以与最大访问地址,前向范围和后向范围相关联,并且搜索步骤可以包括从堆栈的末尾开始的一个方向中搜索多个堆栈条目,并且确定是否 引用地址在每个堆栈条目之间(最大访问地址 - 后向范围)和(最大访问地址+转发范围)之间,直到找到匹配的堆栈条目。

    System and method for effecting information governance
    26.
    发明授权
    System and method for effecting information governance 有权
    影响信息治理的制度和方法

    公开(公告)号:US08131677B2

    公开(公告)日:2012-03-06

    申请号:US12130976

    申请日:2008-05-30

    IPC分类号: G06F17/30

    摘要: A method to manage data located on networked devices is provided. The method includes replicating objects residing on the devices and collecting information about at least one of the objects or the devices. The method further includes receiving input on desired information governance policies and outcomes and analyzing the replicated objects, collected information and received input to determine an information governance action.

    摘要翻译: 提供了一种管理网络设备上的数据的方法。 该方法包括复制驻留在设备上的对象并且收集关于对象或设备中的至少一个的信息。 该方法还包括接收关于期望信息治理策略和结果的输入,并分析复制对象,收集的信息和接收的输入以确定信息治理动作。

    System and method for providing a trustworthy inverted index to enable searching of records
    27.
    发明授权
    System and method for providing a trustworthy inverted index to enable searching of records 失效
    用于提供可信赖的反向索引以使得能够搜索记录的系统和方法

    公开(公告)号:US07765215B2

    公开(公告)日:2010-07-27

    申请号:US11466173

    申请日:2006-08-22

    IPC分类号: G06F17/00 G06F17/30

    CPC分类号: G06F17/30631 G06F21/64

    摘要: A trustworthy inverted index system processes records to identify features for indexing, generates posting lists corresponding to features in a dictionary, maintains in a storage cache a tail of at least one of the posting lists to minimize random I/Os to the index, determines a desired number of the posting lists based on a desired level of insertion performance, a query performance, or a size of the storage cache, and reads a posting list corresponding to a search feature in a query to identify records that comprise the search feature. The system maps the features in the dictionary to the desired number of posting lists. The system uses a jump pointer to point from one entry to the next in the posting lists based on increasing values of entries in the posting lists.

    摘要翻译: 可靠的反向索引系统处理记录以识别用于索引的特征,生成与字典中的特征相对应的发布列表,在存储高速缓存中维护至少一个发布列表的尾部以最小化索引的随机I / O,确定 基于期望的插入性能水平,查询性能或存储高速缓存的大小,发送列表的期望数量,并且读取与查询中的搜索特征相对应的发布列表,以识别构成搜索特征的记录。 系统将字典中的功能映射到所需的发布列表数量。 系统使用跳转指针根据发布列表中条目的增加值从发布列表中的一个条目指向下一个条目。

    Reducing data loss and unavailability by integrating multiple levels of a storage hierarchy
    28.
    发明申请
    Reducing data loss and unavailability by integrating multiple levels of a storage hierarchy 失效
    通过集成多个级别的存储层次结构来减少数据丢失和不可用性

    公开(公告)号:US20090193289A1

    公开(公告)日:2009-07-30

    申请号:US12186676

    申请日:2008-08-06

    IPC分类号: G06F11/14

    摘要: A method for reducing data loss and unavailability by integrating multiple levels of a storage hierarchy is provided. The method includes receiving a read request. In addition, the method includes recognizing a data failure in response to the read request. The method further includes locating an alternate source of the data to be read in response to recognizing the data failure. The alternate source includes data cached at devices in the storage hierarchy, data in a backup system, and cumulative changes to the data since the last backup. Moreover, the method includes responding to the read request with data from the alternate source.

    摘要翻译: 提供了一种通过集成多个级别的存储层次来减少数据丢失和不可用性的方法。 该方法包括接收读请求。 此外,该方法包括响应于读取请求来识别数据故障。 该方法还包括响应于识别数据故障而定位要读取的数据的备用源。 替代源包括在存储层次结构中的设备缓存的数据,备份系统中的数据以及自上次备份以来对数据的累积更改。 此外,该方法包括用来自备用源的数据响应于读请求。

    System and Method for Content-based Object Ranking to Facilitate Information Lifecycle Management
    29.
    发明申请
    System and Method for Content-based Object Ranking to Facilitate Information Lifecycle Management 失效
    基于内容的对象排名系统和方法,促进信息生命周期管理

    公开(公告)号:US20080161885A1

    公开(公告)日:2008-07-03

    申请号:US11617585

    申请日:2006-12-28

    IPC分类号: A61N1/00

    摘要: A method to manage objects in an information lifecycle management system is provided. The method includes determining a score for each of the objects based on a score of at least one feature within respective ones of each of the objects where the score of the at least one feature being associated with a valuation of the at least one feature. The method also includes managing each of the objects based on the score for each of the objects wherein higher scored objects are managed preferentially.

    摘要翻译: 提供了一种在信息生命周期管理系统中管理对象的方法。 该方法包括基于每个对象中的每个对象中的至少一个特征的分数来确定每个对象的得分,其中至少一个特征的得分与至少一个特征的估值相关联。 该方法还包括基于每个对象的分数管理每个对象,其中较高分数对象被优先管理。

    Method and apparatus for supporting parity protection in a RAID clustered environment
    30.
    发明授权
    Method and apparatus for supporting parity protection in a RAID clustered environment 失效
    在RAID集群环境中支持奇偶校验保护的方法和装置

    公开(公告)号:US06950901B2

    公开(公告)日:2005-09-27

    申请号:US09755858

    申请日:2001-01-05

    摘要: The present invention discloses a method, apparatus, and article of manufacture for implementing a locking structure for supporting parity protection in a RAID clustered environment. When updating parity, the parity is locked so that other nodes cannot access or modify the parity. Accordingly, the parity is locked, read, generated, written, and unlocked by a node. An enhanced protocal may combine the lock and read functions and the write and unlock functions. Further, the SCSI RESERVE and RELEASE commands may be utlized to lock/unlock the parity data. By locking the parity in this maner, overhead is mininized and does not increase as the number of nodes increases.

    摘要翻译: 本发明公开了一种用于实现用于在RAID集群环境中支持奇偶校验保护的锁定结构的方法,装置和制品。 当更新奇偶校验时,奇偶校验被锁定,使得其他节点不能访问或修改奇偶校验。 因此,奇偶校验被节点锁定,读取,生成,写入和解锁。 增强的协议可以组合锁定和读取功能以及写入和解锁功能。 此外,SCSI RESERVE和RELEASE命令可以被用来锁定/解锁奇偶校验数据。 通过锁定此管理器中的奇偶校验,开销被减少,并且随着节点数量的增加而不增加。