摘要:
Representing a number of assets on an originating computer begins with selecting the assets to be represented. Cryptographic hash asset identifiers are generated; each of the asset identifiers is computed using the contents of a particular asset. The asset identifier is a content-based or content-addressable asset name for the asset and is location independent. An asset list is generated that includes the asset identifiers computed from the assets. A cryptographic hash asset list identifier is generated that is computed from the asset list. The asset list identifier is stored for later retrieval. The assets selected are also stored for safekeeping either locally or on a computer network. In the event of loss of the files from the originating computer, the asset list identifier is retrieved. Using the asset list identifier, the original asset list is found and retrieved from its safe location. The asset identifiers from the retrieved asset list are used to find and retrieve the individual assets from their backup locations. The assets are verified by recomputing the cryptographic hash asset identifier for each asset retrieved and comparing it to the asset identifier from the asset list. The MD5 algorithm is used for the cryptographic hash function. Assets are retrieved using a multicast protocol. A series of importer programs searches for assets to retrieve in progressively more remote locations. Assets are retrieved whole or in segments.
摘要:
Access to content addressable data on a network is facilitated using digital information storing devices or data repositories (“silos”) that monitor broadcast data requests over the network. A number of silos automatically monitor both data requests and data itself that are broadcast over a network. The silos selectively store data. Each silo responds to data requests broadcast over the network with data the silo has previously intercepted. A content addressable file scheme is used to enable the data repositories to reliably identify data being requested. When a data request is received, each silo evaluates whether it has all or a portion of the data being requested and responds to requests when it has the data. Requests for data are implemented by broadcasting a cryptographic has data identifier of the data file needed. The data identifier is used by a silo to determine which data to receive and store.
摘要:
One embodiment is directed to a method for use in a computer system comprising at least first and second computers, wherein the first provides content addressable storage. A request to access a unit of data stored by the first computer is issued by the second computer and received by the first. In one embodiment, the unit of data comprises a first identifier identifying at least one digital asset and metadata relating to the at least one digital asset, and the request identifies the unit of data via a second identifier based, at least in part, on the content of the unit of data. In another embodiment, a request to access a unit of data is sent from a second computer and received at a first computer and identifies the unit of data via a content identifier based, at least in part, on the content of the unit of data. The content identifier is the only identifier that can be used in communication between the first and second computers to identify the unit of data.
摘要:
Techniques for managing the storing of a digital asset by a first computer in a computer system that further comprises a second computer. The first computer receives a communication from the second computer including the digital asset, determines whether the digital asset has been stored by the first computer, and when it has not, stores a copy of the digital asset. The digital asset is identified by a first identifier based, at least in part, on its content. In one embodiment, the first identifier is the only identifier that can be used to identify the digital asset in communication between the first and second computers. In another embodiment, the computer system further stores a unit of data comprising the first identifier and metadata relating to the digital asset.
摘要:
Some embodiments are directed to a technique for storing and/or locating content units stored on an object addressable storage (OAS) system, wherein each content unit is identified by an object identifier. The OAS system may comprise a plurality of zones, each of which stores content units. A mapping process may be defined that maps object identifiers for content units to zones on the OAS system. Thus, the storage location for a content unit on the OAS system may be the zone on the OAS system to which the object identifier for the content unit maps.
摘要:
One embodiment is a system for locating content on a storage system, in which the storage system provides a location hint to the host of where the data is physically stored, which the host can resubmit with future access requests. In another embodiment, an index that maps content addresses to physical storage locations is cached on the storage system. In yet another embodiment, intrinsic locations are used to select a storage location for newly written data based on an address of the data. In a further embodiment, units of data that are stored at approximately the same time having location index entries that are proximate in the index.
摘要:
Some embodiments are directed to a technique for storing and/or locating content units stored on an object addressable storage (OAS) system, wherein each content unit is identified by an object identifier. The OAS system may comprise a plurality of zones, each of which stores content units. A mapping process may be defined that maps object identifiers for content units to zones on the OAS system. Thus, the storage location for a content unit on the OAS system may be the zone on the OAS system to which the object identifier for the content unit maps.
摘要:
A system and method are provided for retention of data on a storage system. An application program provides the storage system with data to be stored on the storage system. The application program also provides the storage system with a retention period that indicates a period of time for which the data may not be deleted. When the storage system receives a request to delete the data, it first evaluates the retention period associated with that data to determine if the retention period has expired. If the retention period has not expired, the storage system denies the request to delete the data.
摘要:
One embodiment is directed to the deletion of content units from a storage system. When a content unit is deleted, a reflection may be created and stored on the storage system. The reflection identifies the deleted content unit and may include additional information, such as a portion of the content of the content unit and audit information regarding the deletion of the content unit.
摘要:
A environment and method are provided for increasing the storage capacity of a data storage environment. Additional storage clusters may be added to the storage environment without affecting the performance of each individual storage cluster. When data is written to the storage environment, a selection may be made as to which storage cluster is to store the data. When data is read from the storage environment, it may be determined which storage cluster stores the data and the data may be retrieved from that storage cluster.