Abstract:
An apparatus and method for analyzing bottlenecks in a data distributed processing system. The apparatus includes a learning unit mining and learning bottleneck-feature association rules based on hardware information related to a bottleneck node, job configuration information related to a bottleneck causing job, and/or I/O information regarding a bottleneck causing task. Based on the bottleneck-feature association rules, a bottleneck cause analyzing unit detects a bottleneck node among multiple nodes performing tasks in the data distributed processing system, and analyzes the bottleneck cause.
Abstract:
A key-value storage device includes a non-volatile memory and a controller. A method of operating the key-value storage device includes: receiving, from a host, information regarding at least one of a random region, comprising random bits, and a non-random region each included in a key; receiving, from the host, a first command including a first key; generating, based on the received information, a mapping index of a mapping table from first bits, the first bits corresponding to at least some of the random bits included in the first key; and controlling an operation for the non-volatile memory, according to the first command, by using the mapping table.
Abstract:
Provided are a method of performing garbage collection and a redundant array of independent disks (RAID) storage system to which the method is applied. The method includes selecting a victim stripe for performing the garbage collection in the RAID storage system based on a ratio of valid pages. Valid pages included in the victim stripe are copied to a non-volatile cache memory. Garbage collection is performed with respect to the victim stripe by using data copied to the non-volatile cache memory.
Abstract:
A redundant array of independent disks (RAID) storage system includes a plurality of storage devices that perform an erase operation according to a plurality of erase unit sizes. A RAID controller controls the plurality of storage devices based on a RAID environment. The RAID controller adjusts the erase unit sizes with respect to at least some of the plurality of storage devices, respectively.
Abstract:
Provided are a method and a system for transmitting data between storage devices over peer-to-peer (P2P) connections of peripheral component interconnect-express (PCIe). The method, performed when a first storage device receives a data request from a host, includes caching data of another storage device via PCIe connection in response to the data request, and transmitting the cached data to the host. The first storage device is configured to convert a logical address received with the data request to a physical address of a memory region of a second storage device, to store data transmitted from the second storage device via the PCIe connection in a second data cache according to the converted physical address, and to perform a cache replacement scheme for the data stored in the second data cache.
Abstract:
A storage controller includes a co-access pattern mining unit configured to detect co-access patterns of data co-accessed during a particular time duration and to generate co-access groups including a plurality of pieces of data complying with the co-access patterns. The storage controller further include a co-access group matching unit configured to select a co-access group matched with read-requested data, among the generated co-access groups, and a data placement unit configured to store the data included in the selected co-access group in a pre-fetch buffer.
Abstract:
The method of operating a storage system includes executing a storage service providing storage of a volume unit to at least one host device, in which the volume includes a first volume and a second volume. The method includes giving a first priority and a second priority lower than the first priority to the first volume and the second volume, respectively, and recovering meta-data for the first volume having the first priority when the storage service is stopped. The method includes starting the storage service using the recovered meta-data for the first volume, and recovering meta-data for the second volume having the second priority.
Abstract:
Data deduplication is performed by separating data into a plurality of data chunks that correspond to first through Nth positions and include symbols, calculating discrimination indices of the positions using frequencies of the symbols in the different positions; arranging the order of the positions based on values of the discrimination indices; and generating fingerprints of the data through combination of data chunks that correspond to a number of the positions, based on the arranged order of the positions.
Abstract:
A storage system includes a control unit which receives data from a client, and a storage device which stores the data. The control unit includes a deduplicator which determines whether the data is duplicate or not and generates duplicate information based on the determination result. The storage device includes a mapping table that includes logical block address (LBA)-physical block address (PBA) translation information and the duplicate information.
Abstract:
A deduplication method using data association information includes extracting information about a target file and at least one reference file associated with the target file as association information before duplication determination is performed. The at least one reference file is identified by the association information as a comparison target set for comparison when the duplication determination of the target file is performed. The duplication determination is performed with the target file with respect to the at least one reference file in the selected comparison target set.