Single pass space efficient system and method for generating an approximate quantile in a data set having an unknown size
    1.
    发明授权
    Single pass space efficient system and method for generating an approximate quantile in a data set having an unknown size 失效
    用于在具有未知尺寸的数据集中生成近似分位数的单遍空间有效系统和方法

    公开(公告)号:US06343288B1

    公开(公告)日:2002-01-29

    申请号:US09268089

    申请日:1999-03-12

    IPC分类号: G06F1730

    摘要: A space-efficient system and method for generating an approximate &phgr;-quantile data element of a data set in a single pass over the data set, without a priori knowledge of the size of the data set. The approximate &phgr;-quantile is guaranteed to lie within a user-specified approximation error &egr; of the true quantile being sought with a probability of at least 1−&dgr;, with &dgr; being a user-defined probability of failure. B buffers, each having a capacity of k elements, initially are filled with elements from the data set, with the values of b and k depending on approximation error e and the probability &dgr;. The buffers are then collapsed into an output buffer, with the remaining buffers then being refilled with elements, collapsed (along with the previous output buffer), and so on until the entire data set has been processed and a single output remains. The element of the output corresponding to the approximate quantile is then output as the approximate quantile. In later iterations (when the height of the tree is at least equal to a predetermined height that depends on &dgr; and &egr;), the data is sampled non-uniformly to populate the buffers to render the desired performance. Parallel processors can be used, with the final output buffers of the processors being sent to a collecting processor P0 as input buffers to the collecting processor P0.

    摘要翻译: 一种空间有效的系统和方法,用于在数据集中的单次传递中生成数据集的近似分位数据元素,而无需对数据集的大小的先验知识。 大致的分位数被保证位于用至少1-delta的概率寻求的真实分位数的用户指定的近似误差εi中,其中Δ是用户定义的故障概率。 每个具有k个元素的容量的B缓冲器最初由数据集中的元素填充,其中b和k的值取决于近似误差e和概率delta。 缓冲区然后被折叠成输出缓冲区,剩余的缓冲区然后被元素重新填充(与先前的输出缓冲区一起),等等,直到整个数据集被处理并且保持单个输出。 然后输出对应于近似分位数的输出元素作为近似分位数。 在后面的迭代中(当树的高度至少等于取决于delta和epsi的预定高度时),数据被不均匀地采样以填充缓冲器以呈现期望的性能。 可以使用并行处理器,处理器的最终输出缓冲器被发送到收集处理器P0作为到采集处理器P0的输入缓冲器。

    Single pass space efficent system and method for generating approximate
quantiles satisfying an apriori user-defined approximation error
    2.
    发明授权
    Single pass space efficent system and method for generating approximate quantiles satisfying an apriori user-defined approximation error 失效
    单通道空间效率系统和方法,用于生成满足先验用户定义的近似误差的近似分位数

    公开(公告)号:US6108658A

    公开(公告)日:2000-08-22

    申请号:US50434

    申请日:1998-03-30

    IPC分类号: G06F7/22 G06F17/30

    摘要: A system and method for finding an .epsilon.-approximate .phi.-quantile data element of a data set with N data elements in a single pass over the data set. The .epsilon.-approximate .phi.-quantile data element is guaranteed to lie within a user-specified approximation error .epsilon. of a true .phi.-quantile data element being sought. B buffers, each having a capacity of k elements, initially are filled with sorted data elements from the data set, with the values of b and k depending on .epsilon. and N. The buffers are then collapsed into an output buffer, with the remaining buffers then being refilled with data elements, collapsed (along with the previous output buffer), and so on until the entire data set has been processed and a single output buffer remains. A data element of the output buffer corresponding to the .epsilon.-approximate .phi.-quantile is then output as the approximate .phi.-quantile data element. If desired, the system and method can be practiced with sampling to even further reduce the amount of space required to find a desired .epsilon.-approximate .phi.-quantile data element.

    摘要翻译: 一种用于在数据集中的单次传递中找到具有N个数据元素的数据集的ε-近似phi-量子数据元素的系统和方法。 ε-近似phi - 数量数据元素被保证位于正在寻找的真实phi - 数量数据元素的用户指定的近似误差ε。 每个具有k个元素的容量的B缓冲器最初由来自数据集的排序数据元素填充,其中b和k的值取决于epsilon和N.然后,缓冲器被折叠成输出缓冲器,其余的缓冲器 然后使用数据元素重新填充数据元素,并与之前的输出缓冲区一起折叠,等等,直到整个数据集被处理完毕,并保留单个输出缓冲区。 然后,输出对应于ε-近似phi - 数量的输出缓冲器的数据元素作为近似phi - 数量数据元素。 如果需要,可以采用系统和方法来实施,以进一步减少找到所需的ε-近似phi - 数量数据元素所需的空间量。

    Index partition maintenance over monotonically addressed document sequences
    3.
    发明授权
    Index partition maintenance over monotonically addressed document sequences 有权
    索引分区维护通过单调寻址的文档序列

    公开(公告)号:US08738673B2

    公开(公告)日:2014-05-27

    申请号:US12875615

    申请日:2010-09-03

    IPC分类号: G06F17/30

    摘要: Provided are techniques for partitioning a physical index into one or more physical partitions; assigning each of the one or more physical partitions to a node in a cluster of nodes; for each received document, assigning an assigned-doc-ID comprising an integer document identifier; and, in response to assigning the assigned-doc-ID to a document, determining a cut-off of assignment of new documents to a current virtual-index-epoch comprising a first set of physical partitions and placing the new documents into a new virtual-index-epoch comprising a second set of physical partitions by inserting each new document to a specific one of the physical partitions in the second set using one or more functions that direct the placement based on one of the assigned-doc-id, a field value derived from a set of fields obtained from the document, and a combination of the assigned-doc-id and the field value.

    摘要翻译: 提供了用于将物理索引分割成一个或多个物理分区的技术; 将一个或多个物理分区中的每一个分配给节点簇中的节点; 对于每个接收到的文档,分配包括整数文档标识符的分配文档ID; 并且响应于将分配的文档ID分配给文档,确定新文档的分配到当前虚拟索引时期的截断,该当前虚拟索引时期包括第一组物理分区,并将新文档放入新的虚拟 - 指数 - 历元包括第二组物理分区,通过使用一个或多个基于所分配的文档ID中的一个来指导所述布局的功能,将每个新文档插入第二组中的特定一个物理分区 从文档获得的一组字段中导出的值以及分配的doc-id和字段值的组合。

    System and method for hybrid hash join using over-partitioning to respond to database query
    4.
    发明授权
    System and method for hybrid hash join using over-partitioning to respond to database query 失效
    用于混合哈希连接的系统和方法使用超分区来响应数据库查询

    公开(公告)号:US06226639B1

    公开(公告)日:2001-05-01

    申请号:US09158741

    申请日:1998-09-22

    IPC分类号: G06F1730

    摘要: A system and method for joining a build table to a probe table in response to a query for data includes over partitioning the build table into “N” build partitions using a uniform hash function and writing the build partitions into main memory of a database computer. When the main memory becomes full, one or more partitions is selected as a victim partition to be written to disk storage, and the process continues until all build table rows or tuples have either been written into main memory or spilled to disk. Then, a packing algorithm is used to initially designate never-spilled partitions as “winners” and spilled partitions as “losers”, and then to randomly select one or more winners for prospective swapping with one or more losers. The I/O savings associated with each prospective swap is determined and if any savings would be realized, the winners are designated as losers the losers are designated as winners. The swap determination can be made multiple times, e.g., 256, after which losers are moved entirely to disk and winners are moved entirely to memory. At the end of the swapping, probe table rows associated with winner partitions are joined to rows in the winner build partitions while probe table rows associated with loser partitions are spilled to disk. Then, the loser build partitions are written to main memory for joining with corresponding probe table partitions, to undertake the requested join of the build table and probe table in an I/O- and memory-efficient manner.

    摘要翻译: 响应于数据查询将构建表连接到探测表的系统和方法包括使用统一散列函数将构建表过度分割为“N”构建分区,并将构建分区写入数据库计算机的主存储器。 当主内存变满时,将选择一个或多个分区作为要写入磁盘存储器的受害分区,并且该过程继续进行,直到所有构建表行或元组都已写入主内存或溢出到磁盘。 然后,打包算法用于初始地将未分配的分区指定为“获胜者”,将分区分散为“输家”,然后随机选择一个或多个获胜者进行与一个或多个输家的潜在交换。 确定与每个预期掉期相关的I / O节省,如果实现了任何节省,则获胜者被指定为失败者被指定为赢家的输家。 交换确定可以进行多次,例如256次,之后输家完全移动到磁盘,获胜者完全移动到内存。 在交换结束时,与优胜者分区关联的探测表行将连接到优胜者构建分区中的行,而与失败分区关联的探测表行会溢出到磁盘。 然后,失败者构建分区被写入主存储器以与相应的探测表分区相连接,以I / O和存储器高效的方式承载构建表和探测表的所请求的连接。

    Signature hash for checking versions of abstract data types
    5.
    发明授权
    Signature hash for checking versions of abstract data types 失效
    用于检查抽象数据类型的版本的签名散列

    公开(公告)号:US06973572B1

    公开(公告)日:2005-12-06

    申请号:US09514607

    申请日:2000-02-28

    IPC分类号: G06F17/30 G06F21/00 H04I9/28

    摘要: A method, apparatus, and article of manufacture for providing to a signature hash for checking versions of abstract data types. An identifier is constructed for the abstract data type that is substantially unique to the abstract data type, wherein the identifier comprises a concatenation of various attributes for the abstract data type. The constructed identifier is hashed to generate a signature hash value for the abstract data type, which is then stored both in the database and a class definition for the abstract data type. When the class definition is instantiated as a library function, it accesses the abstract data type from the database, and compares the signature hash value from the database and the signature hash value from the class definition in order to verify that the class definition is not outdated. The class definition is outdated when the abstract data type has been altered without the signature hash value being re-generated and re-stored in the database and the class definition.

    摘要翻译: 一种用于提供用于检查抽象数据类型的版本的签名散列的方法,装置和制品。 为抽象数据类型基本上唯一的抽象数据类型构造标识符,其中标识符包括抽象数据类型的各种属性的级联。 构造的标识符被散列以产生抽象数据类型的签名散列值,然后将其存储在数据库中并且抽象数据类型的类定义中。 当类定义被实例化为库函数时,它从数据库中访问抽象数据类型,并比较数据库中的签名散列值和类定义中的签名哈希值,以验证类定义是否过时 。 当抽象数据类型被更改,而不会将签名哈希值重新生成并重新存储在数据库和类定义中时,类定义已过时。

    System and method for hash loops join of data using outer join and early-out join
    6.
    发明授权
    System and method for hash loops join of data using outer join and early-out join 失效
    散列循环的系统和方法使用外连接和提前连接来连接数据

    公开(公告)号:US06253197B1

    公开(公告)日:2001-06-26

    申请号:US09167395

    申请日:1998-10-06

    IPC分类号: G06F1730

    摘要: A system and method for joining a build table to a probe table in response to a query for data includes executing a hash loops join of the build table and the probe table. Matched rows are joined and output when the rows match each other by satisfying a join predicate. In an outer join, unmatched rows in the probe table are joined to a NULL build table field values and output, such that all rows of the probe table are output regardless of whether they have matched rows in the build table. In an early-out join, on the other hand, a “match once” table defines the probe table and in response to a query for unique probe table outputs, the joining of a probe table row, once joined and output a first time, to any other rows in the other table is prevented regardless of whether the row might match other rows. In both the hash loops early-out join and the hash loops outer join, when the build table is larger than main memory, the roles of the build and probe tables are reversed.

    摘要翻译: 响应于对数据的查询将构建表连接到探测表的系统和方法包括执行构建表和探测表的散列循环连接。 匹配的行被连接并通过满足连接谓词来匹配行时输出。 在外连接中,探测表中的不匹配的行将连接到NULL构建表字段值和输出,以便无论在构建表中是否具有匹配行,都会输出探测表的所有行。 另一方面,在早期连接中,“匹配一次”表定义探针表,并且响应于对唯一探针表输出的查询,加入探针表行,一旦连接并输出第一次, 不管该行是否与其他行匹配,都会阻止其他表中的任何其他行。 在哈希循环早期连接和散列循环外连接中,当构建表大于主内存时,构建和探测表的角色相反。

    Database management system, method and program for supporting the
mutation of a composite object without read/write and write/write
conflicts
    7.
    发明授权
    Database management system, method and program for supporting the mutation of a composite object without read/write and write/write conflicts 失效
    数据库管理系统,支持复制对象突变的方法和程序,无需读/写和写/写冲突

    公开(公告)号:US5857182A

    公开(公告)日:1999-01-05

    申请号:US786605

    申请日:1997-01-21

    IPC分类号: G06F17/30

    摘要: The system, method, and program of this invention avoids potential write/write conflicts and read/write conflicts when a subcomponent of a composite object (e.g., an ADT) is mutated. The embodiments of this invention define a copy semantic for the mutation function. In one embodiment, a copy function is inserted prior to any mutation function. In a another embodiment, a global compile-time analysis is performed to determine if a write/write or read/write conflict exists; and to eliminate redundant copy constructors if a conflict does exist. In a preferred embodiment, only a local analysis is performed during the parsing phase, thereby avoiding a global compile-time analysis. A mutation safe flag is associated with each parse tree node. A read target leaf parse tree node is set to false while non-leaf parse tree nodes (functions) derive their value from an incoming node, except that constructors and copy constructor functions are always true. Whether or not a copy is made of the composite object (i.e., whether or not a copy constructor is inserted) prior to a mutation is determined according to the setting of the mutation safe flags and according to the following. If a mutation safe flag for a mutation function is false, a copy constructor is inserted for the mutated composite object and the mutation safe flag is set to true. In addition, for update and trigger statements, the mutation safe flag for a mutated target is defaulted to true. Furthermore, related update entries are grouped together and a copy is generated for the common target. The generated copy is used as the common target for all of the mutations caused by the update entries grouped together in order to accumulate all of the desired mutations in a same copy of the composite object.

    摘要翻译: 本发明的系统,方法和程序在复合对象(例如,ADT)的子组件被突变时避免潜在的写入/写入冲突和读/写冲突。 本发明的实施例定义了突变功能的复制语义。 在一个实施例中,在任何突变功能之前插入复制功能。 在另一实施例中,执行全局编译时分析以确定是否存在写/写或读/写冲突; 并且如果存在冲突,则消除冗余的拷贝构造函数。 在优选实施例中,在解析阶段仅执行局部分析,从而避免了全局编译时分析。 突变安全标志与每个解析树节点相关联。 读取目标叶解析树节点设置为false,而非叶解析树节点(函数)从传入节点导出其值,除了构造函数和复制构造函数始终为真。 根据突变安全标志的设置,根据以下来确定复制对象(即,是否插入复制构造函数)之前的副本是否被复制。 如果突变功能的突变安全标志为假,则为突变复合对象插入复制构造函数,并将突变安全标志设置为true。 另外,对于更新和触发语句,突变目标的突变安全标志默认为true。 此外,将相关的更新条目分组在一起,并为共同目标生成副本。 生成的拷贝被用作由更新条目组合在一起引起的所有突变的共同目标,以便累积复合对象的相同副本中的所有期望的突变。

    Method, system and program for executing a query having a union operator
    8.
    发明授权
    Method, system and program for executing a query having a union operator 有权
    用于执行具有联合运算符的查询的方法,系统和程序

    公开(公告)号:US07539667B2

    公开(公告)日:2009-05-26

    申请号:US10982441

    申请日:2004-11-05

    IPC分类号: G06F17/30

    摘要: Disclosed is a data processing system implemented method, a data processing system and an article of manufacture for executing a query having a union operator. A data processing system implemented method direct the data processing system to execute a query against a database having data objects. The query has sub-queries and having a union operator. The union operator is operable on sub-queries associated with the query. The database is operatively coupled to the data processing system. The data processing system implemented method including grouping the sub-queries of the union operator according to identified structural similarities, the identified structural similarities being based on an analysis of the sub-queries, grouping the data objects of the database according to the grouped sub-queries, replacing the grouped data objects and any sub-queries associated with the grouped data objects with a reference to a representative data object and a representative sub-query, and accessing at least one member of the grouped data objects, the accessing of the at least one member of the grouped data object being based on the reference.

    摘要翻译: 公开了一种用于执行具有联合运算符的查询的数据处理系统实现方法,数据处理系统和制品。 数据处理系统实现的方法指导数据处理系统对具有数据对象的数据库执行查询。 该查询具有子查询并具有联合运算符。 联合运算符可用于与查询相关联的子查询。 数据库可操作地耦合到数据处理系统。 所述数据处理系统实现方法包括根据所识别的结构相似性对所述联合运算符的子查询进行分组,所识别的结构相似性基于所述子查询的分析,根据所述分组子集对数据库的数据对象进行分组, 查询,通过参考代表性数据对象和代表性子查询替换分组数据对象和与分组数据对象相关联的任何子查询,以及访问分组数据对象的至少一个成员,访问at 分组数据对象的至少一个成员基于参考。

    Method, system and program for executing a query having a UNION operator
    9.
    发明授权
    Method, system and program for executing a query having a UNION operator 失效
    用于执行具有UNION运算符的查询的方法,系统和程序

    公开(公告)号:US07409385B2

    公开(公告)日:2008-08-05

    申请号:US10982337

    申请日:2004-11-05

    IPC分类号: G06F7/00 G06F17/00

    摘要: Disclosed is a data processing system implemented method, a data processing system and an article of manufacture for executing a query having a union operator. The data processing system implemented method directs the data processing system to process a query against data objects. The data objects are operatively coupled to the data processing system. The query includes a parent operator. The parent operator references a union operator. The union operator references sub-queries. The sub-queries reference the data objects. The data processing system implemented method includes noting a set of partitionings for the union operator, the noted set of partitionings being based on the sub-queries and being based on the data objects reference by the sub-queries, and executing the query having the union operator, the execution of the query being based on the noted set of partitionings and the parent operator.

    摘要翻译: 公开了一种用于执行具有联合运算符的查询的数据处理系统实现方法,数据处理系统和制品。 数据处理系统实现的方法指导数据处理系统处理对数据对象的查询。 数据对象可操作地耦合到数据处理系统。 该查询包括父操作符。 父操作员引用联合运算符。 联合运算符引用子查询。 子查询引用数据对象。 所述数据处理系统实现的方法包括注意用于联合运算符的一组分区,所述一组分区基于子查询,并且基于由子查询引用的数据对象,并且执行具有联合的查询 操作员,查询的执行是基于所述的一组分区和父操作符。

    Optimal storage mechanism for persistent objects in DBMS
    10.
    发明授权
    Optimal storage mechanism for persistent objects in DBMS 失效
    DBMS中持久对象的最佳存储机制

    公开(公告)号:US6065013A

    公开(公告)日:2000-05-16

    申请号:US914394

    申请日:1997-08-19

    IPC分类号: G06F17/30

    摘要: A method, apparatus, and article of manufacture for a computer implemented storage mechanism for persistent objects in a database management system. A statement is executed in a computer. The statement is performed by the computer to manipulate data in a database stored on a data storage device connected to the computer. It is determined that an object is to be stored in an inline buffer. When the object can be entirely stored in the inline buffer, the object is stored in the inline buffer. When the object cannot be entirely stored in the inline buffer, a selected portion of the object is stored in the inline buffer and the remaining portion of the object is stored as a large object.

    摘要翻译: 一种用于数据库管理系统中用于持久对象的计算机实现的存储机制的方法,装置和制品。 在计算机中执行语句。 该语句由计算机执行以操作存储在连接到计算机的数据存储设备上的数据库中的数据。 确定一个对象要存储在一个内联缓冲区中。 当对象可以完全存储在内联缓冲区中时,对象被存储在内联缓冲区中。 当对象不能完全存储在内联缓冲区中时,对象的选定部分被存储在内联缓冲区中,并且对象的剩余部分被存储为大对象。