-
公开(公告)号:US07958114B2
公开(公告)日:2011-06-07
申请号:US12098178
申请日:2008-04-04
CPC分类号: G06F17/30306 , G06Q30/0202
摘要: A database server may be configured to compute distinct page counts of pages accessed to execute operands of respective queries. The queries may be executed against a table comprised of the pages and having an index managed by the database server. The distinct page counts may be obtained by counting, as a part of the executing of the queries, distinct pages accessed during the execution of the queries.
摘要翻译: 数据库服务器可以被配置为计算被访问的页面的不同页面计数以执行各个查询的操作数。 可以针对由该页组成的表并且具有由数据库服务器管理的索引来执行查询。 独立页面计数可以通过在执行查询期间计数访问的不同页面作为执行查询的一部分来获得。
-
公开(公告)号:US20100299367A1
公开(公告)日:2010-11-25
申请号:US12469399
申请日:2009-05-20
IPC分类号: G06F17/30
CPC分类号: G06F16/24578 , G06F16/245 , G06F16/24535 , G06F16/24539 , G06F16/248 , G06F16/43
摘要: A keyword search is executed on a view of a database based on a Boolean keyword query. The view includes multiple text columns, and the keyword search is executed on each of the multiple text columns in the view. The output results from the keyword search on each of the text columns include tuple identifiers of one or more relevant tuples and a relevancy score for ranking the results of the keyword query.
摘要翻译: 在基于布尔关键字查询的数据库视图上执行关键字搜索。 该视图包括多个文本列,并且在视图中的每个多个文本列上执行关键字搜索。 每个文本列上的关键字搜索的输出结果包括一个或多个相关元组的元组标识符和用于对关键字查询的结果进行排名的相关分数。
-
公开(公告)号:US07707207B2
公开(公告)日:2010-04-27
申请号:US11357665
申请日:2006-02-17
CPC分类号: G06F17/30469 , G06Q30/0283
摘要: The claimed subject matter relates to incorporating a skyline operator within a relational database engine, and more particularly to a database engine that utilizes novel techniques to determine the lowest cost of generating the skyline produced by the skyline operator. The database engine receives queries and associated preferences and, based on a cardinality estimate and a cost estimate, an appropriate skyline generating technique is utilized to produce a skyline representative of the received queries and its associated preferences.
摘要翻译: 所要求保护的主题涉及在关系数据库引擎内并入天际线运算符,更具体地涉及利用新技术来确定由天际线运算符产生的天际线产生的最低成本的数据库引擎。 数据库引擎接收查询和相关联的偏好,并且基于基数估计和成本估计,利用适当的地平线生成技术来产生所接收的查询及其相关联的偏好的天际线。
-
公开(公告)号:US07567949B2
公开(公告)日:2009-07-28
申请号:US10238175
申请日:2002-09-10
CPC分类号: G06F17/30536 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935
摘要: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.
摘要翻译: 数据库服务器根据期望的抽样语义(例如替换(WR),无替换(WoR)或独立硬币翻转(CF))语义支持对记录或元组进行加权和未加权采样。 数据库服务器可以顺序地执行这样的采样,以便例如非查询记录例如在查询树中由流水线生成的非实体记录,但是也可以在一次通过中对采样记录(无论是否实现)进行采样。 数据库服务器还支持对两个记录或元组关系的连接进行抽样,而不需要计算完整连接,而不需要在关系的连接属性值上实现关系和/或索引。
-
公开(公告)号:US07516149B2
公开(公告)日:2009-04-07
申请号:US10929514
申请日:2004-08-30
CPC分类号: G06F17/30303 , Y10S707/99932 , Y10S707/99933 , Y10S707/99937 , Y10S707/99942 , Y10S707/99943 , Y10S707/99945
摘要: At least one implementation, described herein, detects fuzzy duplicates and eliminates such duplicates. Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon.
摘要翻译: 本文描述的至少一个实施例检测模糊重复并消除这种重复。 模糊重复是代表相同的真实世界实体或现象的数据库中的多个看似独特的元组(即,记录)。
-
66.
公开(公告)号:US20090083214A1
公开(公告)日:2009-03-26
申请号:US11858920
申请日:2007-09-21
申请人: Arnd C. Konig , Surajit Chaudhuri , Kenneth Church , Liying Sui
发明人: Arnd C. Konig , Surajit Chaudhuri , Kenneth Church , Liying Sui
IPC分类号: G06F17/30
CPC分类号: G06F16/3331 , G06F16/313
摘要: Index structures and query processing framework that enforces a given threshold on the overhead of computing conjunctive keyword queries. This includes a keyword processing algorithm, logic to determine which indexes to materialize, and a probabilistic approach to reducing the overhead for determining which indexes to build. The index structures leverage the fact that the frequency distribution of natural-language text follows a power law. Given a document collection, a set of indexes is proposed for materialization so that the time for intersecting keywords does not exceed a given threshold Δ. When considering the associated space requirement, the additional indexes are limited. Materialization of such a set of indexes for reasonable values of Δ (e.g., the time required to scan 20% of the largest inverted index), at least for a collection of short documents is distributed by the power law.
摘要翻译: 索引结构和查询处理框架,其对计算关键词查询的开销执行给定的阈值。 这包括关键字处理算法,确定要实现哪些索引的逻辑,以及减少用于确定构建哪些索引的开销的概率方法。 指数结构利用了自然语言文本的频率分布遵循幂律的事实。 给定文档集合,提出了一组索引用于实现,以便关键字相交的时间不超过给定的阈值Delta。 在考虑相关空间需求时,附加指标有限。 对于合理的Delta值(例如,扫描20%的最大倒排指数所需的时间),至少对于短文件的收集,这种一组索引的实现是通过权力法分配的。
-
公开(公告)号:US07493316B2
公开(公告)日:2009-02-17
申请号:US11296036
申请日:2005-12-07
IPC分类号: G06F17/30
CPC分类号: G06F17/30536 , G06F17/30489 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99942
摘要: A method of estimating results of a database query, the results are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. An aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.
摘要翻译: 一种估计数据库查询结果的方法,通过基于在执行工作负载中所需的元组的使用概率对数据库中的加权元组进行抽样来估计结果。 每个元组采样的概率相关。 根据每个采样元组中的值计算聚合,同时乘以与每个元组采样相关联的概率的反转。
-
公开(公告)号:US07483918B2
公开(公告)日:2009-01-27
申请号:US10914901
申请日:2004-08-10
IPC分类号: G06F7/00
CPC分类号: G06F17/30312 , Y10S707/99933 , Y10S707/99945 , Y10S707/99948
摘要: A monitoring component of a database server collects a subset of a query workload along with related statistics. A remote index tuning component uses the workload subset and related statistics to determine a physical design that minimizes the cost of executing queries in the workload subset while ensuring that queries omitted from the subset do not degrade in performance.
摘要翻译: 数据库服务器的监视组件收集查询工作负载的一部分以及相关统计信息。 远程索引调整组件使用工作负载子集和相关统计信息来确定最小化在工作负载子集中执行查询的成本的物理设计,同时确保从子集中省略的查询不会降低性能。
-
69.
公开(公告)号:US07472107B2
公开(公告)日:2008-12-30
申请号:US10601416
申请日:2003-06-23
IPC分类号: G06F17/30
CPC分类号: G06F17/30312 , Y10S707/99932 , Y10S707/99933 , Y10S707/99945
摘要: Integrating the partitioning of physical design structures with the physical design process can result in more efficient query execution. When candidate structures are evaluated for their relative benefit, one or more partitioning methods is associated with each structure so that the benefits of various partitioning methods are taken into consideration when the structures are selected for use by the database. A pool of partitioned candidate structures is formed by proposing and evaluating the benefit of candidate structures with associated partitioning on a per query basis. The selected partitioned candidates are then used to construct generalized structures with associated partitioning methods that are evaluated for their benefit over the workload. Those generalized structures are added to the pool of partitioned candidate structures. From this augmented pool of partitioned candidate structures, an optimal set of partitioned structures is enumerated for use by the database system.
摘要翻译: 将物理设计结构的分区与物理设计过程集成可以实现更有效的查询执行。 当评估候选结构的相对效益时,一个或多个分区方法与每个结构相关联,以便在选择结构以供数据库使用时考虑各种分区方法的优点。 通过在每个查询的基础上提出并评估具有关联划分的候选结构的优点来形成分区候选结构池。 然后,所选择的分区候选者用于构建具有相关分区方法的通用结构,该方法被评估为其对工作负载的好处。 那些广义结构被添加到分区候选结构的池中。 从这个扩展的分区候选结构池中,列举了一组最佳的分区结构,供数据库系统使用。
-
公开(公告)号:US07249141B2
公开(公告)日:2007-07-24
申请号:US10426235
申请日:2003-04-30
CPC分类号: G06F17/30595 , Y10S707/99932 , Y10S707/99933 , Y10S707/99943
摘要: Layout in a database system is performed using workload information. Execution information for a workload is obtained. Cumulative access and co-access information for database objects is then assembled. A cost model is developed for quantitatively capturing the value of different layouts, and a search is performed for a recommended database layout. In one embodiment, a greedy search is performed which initially attempts provide a layout that minimizes co-location of objects on storage objects, and then attempts to improve that layout via a greedy search.
摘要翻译: 使用工作负载信息执行数据库系统中的布局。 获取工作负载的执行信息。 然后组合数据库对象的累积访问和共存信息。 开发了一种成本模型,用于定量捕获不同布局的值,并为推荐的数据库布局执行搜索。 在一个实施例中,执行贪婪搜索,其最初尝试提供使存储对象上的对象的共同定位最小化的布局,然后尝试通过贪婪搜索来改进该布局。
-
-
-
-
-
-
-
-
-