Inverted index storage structure using subindexes and large objects for tight coupling of information retrieval with database management systems
    1.
    发明授权
    Inverted index storage structure using subindexes and large objects for tight coupling of information retrieval with database management systems 有权
    反索引存储结构使用子索引和大对象紧密耦合信息检索与数据库管理系统

    公开(公告)号:US06349308B1

    公开(公告)日:2002-02-19

    申请号:US09250487

    申请日:1999-02-15

    IPC分类号: G06F1700

    摘要: This invention relates to an inverted index storage structure that indexes keyword inputs into the storage space for the corresponding posting lists. In particular, the invention relates to the index structure that enables fast retrieval of the posting of the specific document from the posting list and enables efficient arrangement and maintenance of the posting list in document identifier (docID) order, so that fast addition, deletion, modification, and retrieval of documents are possible in environments where a database management system is tightly coupled with information retrieval. The technical solution is to store the posting list in a large object and map to each posting list a subindex that indexes the docID into the postings containing the docID.

    摘要翻译: 本发明涉及一种反向索引存储结构,其将关键字输入索引到相应的发布列表的存储空间中。 特别地,本发明涉及能够从发布列表快速检索特定文档的索引结构,并且能够以文档标识符(docID)顺序有效地布置和维护发布列表,使得快速添加,删除, 在数据库管理系统与信息检索紧密耦合的环境中,文档的修改和检索是可能的。 技术解决方案是将发布列表存储在一个大对象中,并将每个发布列表映射到将docID索引到包含docID的帖子中的子索引。

    Linear-time top-k sort method
    2.
    发明授权
    Linear-time top-k sort method 有权
    线性时间top-k排序方法

    公开(公告)号:US08296306B1

    公开(公告)日:2012-10-23

    申请号:US13304800

    申请日:2011-11-28

    IPC分类号: G06F7/00

    CPC分类号: G06F7/22

    摘要: The present invention relates to an algorithm that retrieves only k data elements having the largest (or smallest) key values from a dataset (i.e., top-k results) in a time linearly proportional to the size of the dataset. The proposed method using the algorithm finds the top-k results using a k-sized min (or max) heap structure that maintains candidate elements of the top-k results by scanning all data elements in the dataset only once. In other words, the present invention provides a linear-time top-k sort method that finds top-k results in a time linearly proportional to the size of the dataset (i.e., O(n) time complexity), while conventional sort algorithms for finding top-k results cannot find the top-k results in a time linearly proportional to the size of the dataset (i.e., at least O(n log n) time complexity).

    摘要翻译: 本发明涉及在与数据集的大小成线性比例的时间内从数据集中检索具有最大(或最小)密钥值的k个数据元素(即,top-k结果)的算法。 使用该算法的所提出的方法使用k尺寸的最小(或最大)堆结构找到top-k结果,其通过仅扫描数据集中的所有数据元素一次来维护顶部k结果的候选元素。 换句话说,本发明提供一种线性时间top-k分类方法,其以与数据集的大小成线性比例的时间(即,O(n)时间复杂度)来找到top-k结果,而传统的排序算法 找到top-k结果不能在与数据集的大小成线性比例的时间内找到top-k结果(即,至少O(n log n)时间复杂度)。

    Method of storing data into flash memory in a DBMS-independent manner using the page-differential
    3.
    发明授权
    Method of storing data into flash memory in a DBMS-independent manner using the page-differential 有权
    使用页面差异以不依赖于DBMS的方式将数据存储到闪存中的方法

    公开(公告)号:US08117406B2

    公开(公告)日:2012-02-14

    申请号:US12507946

    申请日:2009-07-23

    IPC分类号: G06F13/00

    摘要: The present invention proposes an effective and efficient method of storing data called page-differential logging for flash-based storage systems. The primary characteristics of the invention are: (1) it writes only the page-differential that is defined as the difference between an original page in flash memory and an up-to-date page in memory; (2) it computes and writes the page-differential only when an updated page needs to be reflected into flash memory. When an updated page needs to be reflected into flash memory, the present invention stores the page into a base page and a differential page in flash memory. When a page is recreated from flash memory, it reads the base page and the differential page, and then, creates the page by merging the base page with its page-differential in the differential page. This invention significantly improves I/O performance of flash-based storage systems compared with existing page-based and log-based methods.

    摘要翻译: 本发明提出了一种用于存储用于基于闪存的存储系统的称为页面差异日志的数据的有效和有效的方法。 本发明的主要特征是:(1)仅写入定义为闪存中的原始页与存储器中的最新页之间的差异的页差; (2)只有当更新的页面需要反映到闪存中时,它才会计算和写入页面差异。 当更新的页面需要反映到闪速存储器中时,本发明将页面存储在闪存中的基页和差分页面中。 当从Flash存储器重新创建一个页面时,它会读取基本页面和差分页面,然后通过将页面页面与页面差异页面合并在差异页面中来创建页面。 与现有的基于页面和基于日志的方法相比,本发明显着提高了基于闪存的存储系统的I / O性能。

    Two-level n-gram index structure and methods of index building, query processing and index derivation
    4.
    发明授权
    Two-level n-gram index structure and methods of index building, query processing and index derivation 有权
    二级n-gram索引结构和索引构建方法,查询处理和索引推导

    公开(公告)号:US07792840B2

    公开(公告)日:2010-09-07

    申请号:US11501265

    申请日:2006-08-09

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30622

    摘要: Disclosed relates to a structure of two-level n-gram inverted index and methods of building the same, processing queries and deriving the index that reduce the size of n-gram inverted index and improves the query performance by eliminating the redundancy of the position information that exists in the n-gram inverted index.The inverted index of the present invention comprises a back-end inverted index using subsequences extracted from documents as a term and a front-end inverted index using n-grams extracted from the subsequences as a term. The back-end inverted index uses the subsequences of a specific length extracted from the documents to be overlapped with each other by n−1 (n: the length of n-gram) as a term and stores position information of the subsequences occurring in the documents in a posting list for the respective subsequences. The front-end inverted index uses the n-grams of a specific length extracted from the subsequences using a 1-sliding technique as a term and stores position information of the n-grams occurring in the subsequences in a posting list for the respective n-grams.

    摘要翻译: 本发明涉及二级n-gram反向索引的结构及其构建方法,处理查询和导出减少n-gram反向索引大小的索引,并通过消除位置信息的冗余来提高查询性能 存在于n-gram倒排指数中。 本发明的倒排索引包括使用从文档中提取的子序列作为术语的后端反向索引,以及使用从子序列提取的n-gram作为术语的前端反向索引。 后端倒排索引使用从文档提取的特定长度的子序列作为项目彼此重叠,n-1(n:n-gram的长度)作为项,并存储发生在该文件中的子序列的位置信息 相关子序列的发布列表中的文档。 前端反向索引使用使用1-滑动技术作为术语从子序列中提取的特定长度的n克,并存储在子序列中出现的n个克数的位置信息, 克。

    METHOD OF STORING DATA INTO FLASH MEMORY IN A DBMS-INDEPENDENT MANNER USING THE PAGE-DIFFERENTIAL
    5.
    发明申请
    METHOD OF STORING DATA INTO FLASH MEMORY IN A DBMS-INDEPENDENT MANNER USING THE PAGE-DIFFERENTIAL 有权
    使用差异化将数据存入数据库独立管理器中的闪存存储器的方法

    公开(公告)号:US20100241790A1

    公开(公告)日:2010-09-23

    申请号:US12507946

    申请日:2009-07-23

    IPC分类号: G06F12/02 G06F12/00

    摘要: The present invention proposes an effective and efficient method of storing data called page-differential logging for flash-based storage systems. The primary characteristics of the invention are: (1) it writes only the page-differential that is defined as the difference between an original page in flash memory and an up-to-date page in memory; (2) it computes and writes the page-differential only when an updated page needs to be reflected into flash memory. When an updated page needs to be reflected into flash memory, the present invention stores the page into a base page and a differential page in flash memory. When a page is recreated from flash memory, it reads the base page and the differential page, and then, creates the page by merging the base page with its page-differential in the differential page. This invention significantly improves I/O performance of flash-based storage systems compared with existing page-based and log-based methods.

    摘要翻译: 本发明提出了一种用于存储用于基于闪存的存储系统的称为页面差异日志的数据的有效和有效的方法。 本发明的主要特征是:(1)仅写入定义为闪存中的原始页与存储器中的最新页之间的差异的页差; (2)只有当更新的页面需要反映到闪存中时,它才会计算和写入页面差异。 当更新的页面需要反映到闪速存储器中时,本发明将页面存储在闪存中的基页和差分页面中。 当从Flash存储器重新创建一个页面时,它读取基本页面和差异页面,然后通过在差分页面中合并基页和页面差异来创建页面。 与现有的基于页面和基于日志的方法相比,本发明显着提高了基于闪存的存储系统的I / O性能。

    Two-level n-gram index structure and methods of index building, query processing and index derivation
    7.
    发明申请
    Two-level n-gram index structure and methods of index building, query processing and index derivation 有权
    二级n-gram索引结构和索引构建方法,查询处理和索引推导

    公开(公告)号:US20070050384A1

    公开(公告)日:2007-03-01

    申请号:US11501265

    申请日:2006-08-09

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30622

    摘要: Disclosed relates to a structure of two-level n-gram inverted index and methods of building the same, processing queries and deriving the index that reduce the size of n-gram inverted index and improves the query performance by eliminating the redundancy of the position information that exists in the n-gram inverted index. The inverted index of the present invention comprises a back-end inverted index using subsequences extracted from documents as a term and a front-end inverted index using n-grams extracted from the subsequences as a term. The back-end inverted index uses the subsequences of a specific length extracted from the documents to be overlapped with each other by n−1 (n: the length of n-gram) as a term and stores position information of the subsequences occurring in the documents in a posting list for the respective subsequences. The front-end inverted index uses the n-grams of a specific length extracted from the subsequences using a 1-sliding technique as a term and stores position information of the n-grams occurring in the subsequences in a posting list for the respective n-grams.

    摘要翻译: 本发明涉及二级n-gram反向索引的结构及其构建方法,处理查询和导出减少n-gram反向索引大小的索引,并通过消除位置信息的冗余来提高查询性能 存在于n-gram倒排指数中。 本发明的倒排索引包括使用从文档中提取的子序列作为术语的后端反向索引,以及使用从子序列提取的n-gram作为术语的前端反向索引。 后端倒排索引使用从文档提取的特定长度的子序列作为项目彼此重叠,n-1(n:n-gram的长度)作为项,并存储发生在该文件中的子序列的位置信息 相关子序列的发布列表中的文档。 前端反向索引使用使用1-滑动技术作为术语从子序列中提取的特定长度的n克,并存储在子序列中出现的n个克数的位置信息, 克。

    Subsequence matching method using duality in constructing windows in time-series databases
    8.
    发明授权
    Subsequence matching method using duality in constructing windows in time-series databases 失效
    在时间序列数据库中构建窗口的二重性的子序列匹配方法

    公开(公告)号:US06496817B1

    公开(公告)日:2002-12-17

    申请号:US09559673

    申请日:2000-04-27

    IPC分类号: G06F1730

    摘要: A subsequence matching method in time-series databases, reduces the number of points stored in the multidimensional index and can store individual points directly in the index by dividing the data sequence into disjoint windows using duality in constructing windows. The method reduces false alarms and improves performance by searching the index using the individual points that represent sliding windows of the query sequence and by comparing the points used in the query and the points stored in the index. Moreover, the method can create the index much faster than the previous method by reducing the number of calls to the feature extraction function that is a major part of CPU overhead in the index creation.

    摘要翻译: 时序数据库中的子序列匹配方法减少了存储在多维索引中的点数,并且可以通过在构建窗口中使用二元性将数据序列划分成不相交的窗口,将各个点直接存储在索引中。 该方法通过使用表示查询序列的滑动窗口的各个点搜索索引并通过比较查询中使用的点和索引中存储的点来减少假警报并提高性能。 此外,该方法可以通过减少对作为索引创建中CPU开销的主要部分的特征提取函数的调用次数,来创建比先前方法更快的索引。