Single pass space efficient system and method for generating an approximate quantile in a data set having an unknown size
    1.
    发明授权
    Single pass space efficient system and method for generating an approximate quantile in a data set having an unknown size 失效
    用于在具有未知尺寸的数据集中生成近似分位数的单遍空间有效系统和方法

    公开(公告)号:US06343288B1

    公开(公告)日:2002-01-29

    申请号:US09268089

    申请日:1999-03-12

    IPC分类号: G06F1730

    摘要: A space-efficient system and method for generating an approximate &phgr;-quantile data element of a data set in a single pass over the data set, without a priori knowledge of the size of the data set. The approximate &phgr;-quantile is guaranteed to lie within a user-specified approximation error &egr; of the true quantile being sought with a probability of at least 1−&dgr;, with &dgr; being a user-defined probability of failure. B buffers, each having a capacity of k elements, initially are filled with elements from the data set, with the values of b and k depending on approximation error e and the probability &dgr;. The buffers are then collapsed into an output buffer, with the remaining buffers then being refilled with elements, collapsed (along with the previous output buffer), and so on until the entire data set has been processed and a single output remains. The element of the output corresponding to the approximate quantile is then output as the approximate quantile. In later iterations (when the height of the tree is at least equal to a predetermined height that depends on &dgr; and &egr;), the data is sampled non-uniformly to populate the buffers to render the desired performance. Parallel processors can be used, with the final output buffers of the processors being sent to a collecting processor P0 as input buffers to the collecting processor P0.

    摘要翻译: 一种空间有效的系统和方法,用于在数据集中的单次传递中生成数据集的近似分位数据元素,而无需对数据集的大小的先验知识。 大致的分位数被保证位于用至少1-delta的概率寻求的真实分位数的用户指定的近似误差εi中,其中Δ是用户定义的故障概率。 每个具有k个元素的容量的B缓冲器最初由数据集中的元素填充,其中b和k的值取决于近似误差e和概率delta。 缓冲区然后被折叠成输出缓冲区,剩余的缓冲区然后被元素重新填充(与先前的输出缓冲区一起),等等,直到整个数据集被处理并且保持单个输出。 然后输出对应于近似分位数的输出元素作为近似分位数。 在后面的迭代中(当树的高度至少等于取决于delta和epsi的预定高度时),数据被不均匀地采样以填充缓冲器以呈现期望的性能。 可以使用并行处理器,处理器的最终输出缓冲器被发送到收集处理器P0作为到采集处理器P0的输入缓冲器。

    System and method for hybrid hash join using over-partitioning to respond to database query
    2.
    发明授权
    System and method for hybrid hash join using over-partitioning to respond to database query 失效
    用于混合哈希连接的系统和方法使用超分区来响应数据库查询

    公开(公告)号:US06226639B1

    公开(公告)日:2001-05-01

    申请号:US09158741

    申请日:1998-09-22

    IPC分类号: G06F1730

    摘要: A system and method for joining a build table to a probe table in response to a query for data includes over partitioning the build table into “N” build partitions using a uniform hash function and writing the build partitions into main memory of a database computer. When the main memory becomes full, one or more partitions is selected as a victim partition to be written to disk storage, and the process continues until all build table rows or tuples have either been written into main memory or spilled to disk. Then, a packing algorithm is used to initially designate never-spilled partitions as “winners” and spilled partitions as “losers”, and then to randomly select one or more winners for prospective swapping with one or more losers. The I/O savings associated with each prospective swap is determined and if any savings would be realized, the winners are designated as losers the losers are designated as winners. The swap determination can be made multiple times, e.g., 256, after which losers are moved entirely to disk and winners are moved entirely to memory. At the end of the swapping, probe table rows associated with winner partitions are joined to rows in the winner build partitions while probe table rows associated with loser partitions are spilled to disk. Then, the loser build partitions are written to main memory for joining with corresponding probe table partitions, to undertake the requested join of the build table and probe table in an I/O- and memory-efficient manner.

    摘要翻译: 响应于数据查询将构建表连接到探测表的系统和方法包括使用统一散列函数将构建表过度分割为“N”构建分区,并将构建分区写入数据库计算机的主存储器。 当主内存变满时,将选择一个或多个分区作为要写入磁盘存储器的受害分区,并且该过程继续进行,直到所有构建表行或元组都已写入主内存或溢出到磁盘。 然后,打包算法用于初始地将未分配的分区指定为“获胜者”,将分区分散为“输家”,然后随机选择一个或多个获胜者进行与一个或多个输家的潜在交换。 确定与每个预期掉期相关的I / O节省,如果实现了任何节省,则获胜者被指定为失败者被指定为赢家的输家。 交换确定可以进行多次,例如256次,之后输家完全移动到磁盘,获胜者完全移动到内存。 在交换结束时,与优胜者分区关联的探测表行将连接到优胜者构建分区中的行,而与失败分区关联的探测表行会溢出到磁盘。 然后,失败者构建分区被写入主存储器以与相应的探测表分区相连接,以I / O和存储器高效的方式承载构建表和探测表的所请求的连接。

    Index partition maintenance over monotonically addressed document sequences
    3.
    发明授权
    Index partition maintenance over monotonically addressed document sequences 有权
    索引分区维护通过单调寻址的文档序列

    公开(公告)号:US08738673B2

    公开(公告)日:2014-05-27

    申请号:US12875615

    申请日:2010-09-03

    IPC分类号: G06F17/30

    摘要: Provided are techniques for partitioning a physical index into one or more physical partitions; assigning each of the one or more physical partitions to a node in a cluster of nodes; for each received document, assigning an assigned-doc-ID comprising an integer document identifier; and, in response to assigning the assigned-doc-ID to a document, determining a cut-off of assignment of new documents to a current virtual-index-epoch comprising a first set of physical partitions and placing the new documents into a new virtual-index-epoch comprising a second set of physical partitions by inserting each new document to a specific one of the physical partitions in the second set using one or more functions that direct the placement based on one of the assigned-doc-id, a field value derived from a set of fields obtained from the document, and a combination of the assigned-doc-id and the field value.

    摘要翻译: 提供了用于将物理索引分割成一个或多个物理分区的技术; 将一个或多个物理分区中的每一个分配给节点簇中的节点; 对于每个接收到的文档,分配包括整数文档标识符的分配文档ID; 并且响应于将分配的文档ID分配给文档,确定新文档的分配到当前虚拟索引时期的截断,该当前虚拟索引时期包括第一组物理分区,并将新文档放入新的虚拟 - 指数 - 历元包括第二组物理分区,通过使用一个或多个基于所分配的文档ID中的一个来指导所述布局的功能,将每个新文档插入第二组中的特定一个物理分区 从文档获得的一组字段中导出的值以及分配的doc-id和字段值的组合。

    Adaptive Evaluation of Text Search Queries With Blackbox Scoring Functions
    5.
    发明申请
    Adaptive Evaluation of Text Search Queries With Blackbox Scoring Functions 失效
    使用Blackbox评分函数自适应评估文本搜索查询

    公开(公告)号:US20070150467A1

    公开(公告)日:2007-06-28

    申请号:US11561949

    申请日:2006-11-21

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30672

    摘要: Disclosed is an evaluation technique for text search with black-box scoring functions, where it is unnecessary for the evaluation engine to maintain details of the scoring function. Included is a description of a system for dealing with blackbox searching, proofs of correctness, as well experimental evidence showing that the performance of the technique is comparable in efficiency to those techniques used in custom-built engines.

    摘要翻译: 公开了一种用于具有黑匣子评分功能的文本搜索的评估技术,其中评估引擎不需要保持评分功能的细节。 包括处理黑箱搜索的系统的描述,正确性的证明,以及实验证据表明该技术的性能与定制引擎中使用的技术的效率相当。

    Method for cataloging, filtering, and relevance ranking frame-based hierarchical information structures
    7.
    发明授权
    Method for cataloging, filtering, and relevance ranking frame-based hierarchical information structures 有权
    编目,过滤和相关性排序基于帧的层次信息结构的方法

    公开(公告)号:US06334131B2

    公开(公告)日:2001-12-25

    申请号:US09143733

    申请日:1998-08-29

    IPC分类号: G06F1730

    摘要: A method for cataloging, filtering and ranking information, as for example, World Wide Web pages of the Internet. The method is preferably implemented in computer software and features steps for enabling a user to interactively create an information database including preferred information elements such as preferred-authority World Wide Web pages. The method includes steps for enabling a user to interactively create a frame-based, hierarchical organizational structure for the information elements, and steps for identifying and automatically filtering and ranking by relevance, information elements, such as World Wide Web pages for populating the structure, to form, for example, a searchable, World Wide Web page database. Additionally, the method features steps for enabling a user to interactively define a frame-based, hierarchical information structure for cataloging information, identifying a preliminary population of information elements for a particular hierarchical category arranged as a frame, based upon the respective frame attributes, and thereafter, expanding the information population to include related information, and subsequently, automatically filtering and ranking the information based upon relevance, and then populating the hierarchical structure with a definable portion of the filtered, ranked information elements.

    摘要翻译: 用于对信息进行编目,过滤和排序的方法,例如互联网的万维网页面。 该方法优选地在计算机软件中实现,并且特征步骤用于使得用户能够交互地创建包括诸如优选权威万维网页面之类的优选信息元素的信息数据库。 该方法包括使用户能够交互地创建用于信息元素的基于帧的分层组织结构的步骤,以及用于识别和自动过滤和排序相关性的步骤,诸如用于填充结构的万维网页面的信息元素, 以形成例如可搜索的万维网页数据库。 另外,该方法具有以下步骤:使得用户能够交互地定义用于编目信息的基于帧的分层信息结构,基于相应的帧属性来识别为排列为帧的特定分级类别的信息元素的初步总体,以及 此后,扩展信息群体以包括相关信息,随后基于相关性自动过滤和排序信息,然后用经过排序的信息元素的可定义部分填充分层结构。

    KNOWLEDGE-BASED DATA MINING SYSTEM
    8.
    发明申请
    KNOWLEDGE-BASED DATA MINING SYSTEM 审中-公开
    基于知识的数据挖掘系统

    公开(公告)号:US20120259890A1

    公开(公告)日:2012-10-11

    申请号:US13526424

    申请日:2012-06-18

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951 G06F2216/03

    摘要: In a data mining system, data is gathered into a data store using, e.g., a Web crawler. The data is classified into entities. Data miners use rules to process the entities and append respective keys to the entities representing characteristics of the entities as derived from rules embodied in the miners. With these keys, characteristics of entities as defined by disparate expert authors of the data miners are identified for use in responding to complex data requests from customers.

    摘要翻译: 在数据挖掘系统中,使用例如Web爬行器将数据收集到数据存储中。 数据分为实体。 数据挖掘者使用规则来处理实体,并将相应的密钥附加到代表矿工特征的实体的实体。 利用这些密钥,确定数据挖掘者的不同专家作者定义的实体的特征用于响应客户的复杂数据请求。

    Virtual cursors for XML joins
    10.
    发明授权
    Virtual cursors for XML joins 有权
    XML连接的虚拟游标

    公开(公告)号:US07685138B2

    公开(公告)日:2010-03-23

    申请号:US11270784

    申请日:2005-11-08

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30935

    摘要: A system, method, and computer program product to improve XML query processing efficiency with virtual cursors. Structural joins are a fundamental operation in XML query processing, and substantial work exists on index-based algorithms for executing them. Two well-known index features—path indices and ancestor information—are combined in a novel way to replace at least some of the physical index cursors in a structural join with virtual cursors. The position of a virtual cursor is derived from the path and ancestor information of a physical cursor. Virtual cursors can be easily incorporated into existing structural join algorithms. By eliminating index I/O and the processing cost of handling physical inverted lists, virtual cursors can improve the performance of holistic path queries by an order of magnitude or more.

    摘要翻译: 一种使用虚拟游标来提高XML查询处理效率的系统,方法和计算机程序产品。 结构连接是XML查询处理中的基本操作,并且基于索引的算法存在大量工作来执行它们。 两个众所周知的索引特征 - 路径索引和祖先信息 - 以一种新颖的方式组合,以用至少一些物理索引光标替换虚拟光标的结构连接。 虚拟光标的位置是从物理光标的路径和祖先信息导出的。 虚拟光标可以很容易地并入到现有的结构连接算法中。 通过消除索引I / O和处理物理反转列表的处理成本,虚拟游标可以将整体路径查询的性能提高一个数量级或更多。