Abstract:
Embodiments of the present invention provide mechanisms that overcome limitations of existing indexes by creating two-dimensional (2D) spatial indexes to quickly locate data containers that match two or more predicates. This is accomplished by representing metadata attributes describing a data container as dimensions in a 2D space so that a data container can be expressed as a point or a cell in a 2D space with coordinates being a pair of values of the selected attributes. A space filling curve is used to traverse the 2D space and convert each pair of the 2D coordinates to a single space filling curve value. A 2D spatial index is then created based on the computed space filling curve values so that one value can be associated with one or more points (data containers) in the index. Advantageously, the created spatial index provides for searching and processing fewer metadata entries, thereby decreasing the time typically used to search for data.
Abstract:
A network storage server system includes a distributed object store, a presentation layer, a metadata subsystem, and a content management subsystem. The object store has no namespace and provides location-independent addressing of data objects. The presentation layer provides multiple interfaces for accessing data stored in the object store, including a NAS interface and a Web Service interface, and provides at least one namespace for accessing data via the NAS interface or the Web Service interface. The Web Service interface allows access to stored data via the namespace or without using the namespace (“raw object” access). The metadata subsystem stores user-specified and/or system-generated metadata relating to data objects and allows data objects to be identified and retrieved by searching on the metadata. The content management subsystem autonomously manages lifecycles of data objects according to user-specified policies, based on metadata associated with the data objects and tracked by the metadata subsystem.
Abstract:
A system and method for nearly in-band search indexing. A network switch (or other intermediate network device) is configured to provide port mirroring so that data access requests directed to a storage system are forwarded to both the storage system and to a search appliance. The search appliance collects index information from the received data access requests to update a search index. As the search appliance is nearly in-band, i.e., not directly in-line of the data access request path, no increase of latency occurs for processing data access requests by the storage system.
Abstract:
One embodiment of the present invention provides a system that facilitates delayed block allocation in a distributed file system. During operation, the system receives a write command at a client, wherein the write command includes a buffer containing data to be written and a file identifier. In response to receiving the write command, the system reserves a set of disk blocks for the file from a virtual pool of disk blocks allocated to the client. The system also transfers the data to be written to the kernel of the client where the data waits to be transferred to the disk.
Abstract:
A system and method for improving the relevance of search results using data container access patterns. An indexing process tracks data access patterns and updates an access data structure. When executing a search operation, a search process first identifies a set of data containers containing the search terms. The search process then utilizes the access data structure to rank the identified data containers based on the collected data access pattern information.
Abstract:
Described herein is a flash remapping (FR) layer in a storage operating system for utilizing flash memory as a secondary permanent storage device in a storage system. The FR layer collects particular information (specified by collection parameters) of received access requests for data stored on primary storage devices of the storage system. Based on the collected information and a predetermined access pattern (specified by pattern parameters), the FR layer selects data sets on the primary storage devices to be transferred permanently to flash memory, whereby subsequent access requests to the selected data sets are redirected to flash memory. New parameters may be received by the FR layer (from a user or program) to dynamically reconfigure the functions of the FR layer. The FR layer may be implemented in the operating system without requiring other code of the storage operating system to be modified.
Abstract:
A system and method accelerates update of a metadata search database using PCPI differencing. After first populating the search database, a search agent generates a PCPI and utilizes a PCPI differencing technique to quickly identify changes between inode files of first and second PCPIs. The differences are noted as modified metadata and are written to a log file, which is later read by the search agent to update the search database.
Abstract:
Techniques for a data storage cluster and a method for deduplicating data in the data storage cluster in a scalable manner, by (among other things) using an epoch-based global chunk data structure, are disclosed herein. A global chunk data structure for an epoch is distributed and maintained at a plurality of metadata nodes within the data storage cluster. Fingerprints and identifiers of data chunks are written to the cluster after a particular epoch are written to delta chunk data structures stored in different metadata nodes of the cluster. When the data storage cluster advances to the next epoch, the global chunk data structure is updated using the delta chunk data structures. At any given time, data deduplication in the data storage cluster can be conducted based on the global chunk data structure for the current epoch.
Abstract:
A method and apparatus for proxying search requests for a storage system and maintaining a central index for performing the search requests is described herein. An index manager on the storage system may initially produce the central index by examining each file in a file system and update the central index thereafter by examining only those files that have changed since the central index was initially produced or last updated. The index manager may receive a changed file list from a differencing layer configured for comparing snapshots of the file system at different time points to produce changed file lists. A search proxy module may receive search requests in a search protocol and proxy the search requests to a search engine by converting the search requests to another search protocol compatible with the search engine. The search engine may then use the central index for performing the search request.
Abstract:
Example embodiments provide various techniques for fast and efficient search of attributes stored in data structures. The attributes are organized following a hierarchical structure of the file system and, in an example, the attributes are stored in a data structure where the hierarchical structure is maintained. As a result, a search within such data structure may follow one or more paths along the hierarchical structure of the file system. Attributes associated with directories and files outside of the path can be excluded from the search. Example embodiments also provide various techniques for updating signatures associated with the attributes. In an example, updates to the signatures can be made incrementally. For example, signatures can be updated when the attributes change.