摘要:
This invention relates to an inverted index storage structure that indexes keyword inputs into the storage space for the corresponding posting lists. In particular, the invention relates to the index structure that enables fast retrieval of the posting of the specific document from the posting list and enables efficient arrangement and maintenance of the posting list in document identifier (docID) order, so that fast addition, deletion, modification, and retrieval of documents are possible in environments where a database management system is tightly coupled with information retrieval. The technical solution is to store the posting list in a large object and map to each posting list a subindex that indexes the docID into the postings containing the docID.
摘要:
The present invention relates to an algorithm that retrieves only k data elements having the largest (or smallest) key values from a dataset (i.e., top-k results) in a time linearly proportional to the size of the dataset. The proposed method using the algorithm finds the top-k results using a k-sized min (or max) heap structure that maintains candidate elements of the top-k results by scanning all data elements in the dataset only once. In other words, the present invention provides a linear-time top-k sort method that finds top-k results in a time linearly proportional to the size of the dataset (i.e., O(n) time complexity), while conventional sort algorithms for finding top-k results cannot find the top-k results in a time linearly proportional to the size of the dataset (i.e., at least O(n log n) time complexity).
摘要:
The present invention proposes an effective and efficient method of storing data called page-differential logging for flash-based storage systems. The primary characteristics of the invention are: (1) it writes only the page-differential that is defined as the difference between an original page in flash memory and an up-to-date page in memory; (2) it computes and writes the page-differential only when an updated page needs to be reflected into flash memory. When an updated page needs to be reflected into flash memory, the present invention stores the page into a base page and a differential page in flash memory. When a page is recreated from flash memory, it reads the base page and the differential page, and then, creates the page by merging the base page with its page-differential in the differential page. This invention significantly improves I/O performance of flash-based storage systems compared with existing page-based and log-based methods.
摘要:
Disclosed relates to a structure of two-level n-gram inverted index and methods of building the same, processing queries and deriving the index that reduce the size of n-gram inverted index and improves the query performance by eliminating the redundancy of the position information that exists in the n-gram inverted index.The inverted index of the present invention comprises a back-end inverted index using subsequences extracted from documents as a term and a front-end inverted index using n-grams extracted from the subsequences as a term. The back-end inverted index uses the subsequences of a specific length extracted from the documents to be overlapped with each other by n−1 (n: the length of n-gram) as a term and stores position information of the subsequences occurring in the documents in a posting list for the respective subsequences. The front-end inverted index uses the n-grams of a specific length extracted from the subsequences using a 1-sliding technique as a term and stores position information of the n-grams occurring in the subsequences in a posting list for the respective n-grams.
摘要:
The present invention proposes an effective and efficient method of storing data called page-differential logging for flash-based storage systems. The primary characteristics of the invention are: (1) it writes only the page-differential that is defined as the difference between an original page in flash memory and an up-to-date page in memory; (2) it computes and writes the page-differential only when an updated page needs to be reflected into flash memory. When an updated page needs to be reflected into flash memory, the present invention stores the page into a base page and a differential page in flash memory. When a page is recreated from flash memory, it reads the base page and the differential page, and then, creates the page by merging the base page with its page-differential in the differential page. This invention significantly improves I/O performance of flash-based storage systems compared with existing page-based and log-based methods.
摘要:
A query expansion method that improves the precision without degrading the recall, uses augmented terms. The method steps expand an initial query by adding new terms that are related to each term of the initial query. The query is further expanded by adding augmented terms, which are conjunctions of the terms. A weight is assigned to each term so that the augmented terms have higher weights than the other terms.
摘要:
Disclosed relates to a structure of two-level n-gram inverted index and methods of building the same, processing queries and deriving the index that reduce the size of n-gram inverted index and improves the query performance by eliminating the redundancy of the position information that exists in the n-gram inverted index. The inverted index of the present invention comprises a back-end inverted index using subsequences extracted from documents as a term and a front-end inverted index using n-grams extracted from the subsequences as a term. The back-end inverted index uses the subsequences of a specific length extracted from the documents to be overlapped with each other by n−1 (n: the length of n-gram) as a term and stores position information of the subsequences occurring in the documents in a posting list for the respective subsequences. The front-end inverted index uses the n-grams of a specific length extracted from the subsequences using a 1-sliding technique as a term and stores position information of the n-grams occurring in the subsequences in a posting list for the respective n-grams.
摘要:
A subsequence matching method in time-series databases, reduces the number of points stored in the multidimensional index and can store individual points directly in the index by dividing the data sequence into disjoint windows using duality in constructing windows. The method reduces false alarms and improves performance by searching the index using the individual points that represent sliding windows of the query sequence and by comparing the points used in the query and the points stored in the index. Moreover, the method can create the index much faster than the previous method by reducing the number of calls to the feature extraction function that is a major part of CPU overhead in the index creation.