TECHNIQUES FOR EXACT CARDINALITY QUERY OPTIMIZATION
    11.
    发明申请
    TECHNIQUES FOR EXACT CARDINALITY QUERY OPTIMIZATION 有权
    精确的CARDINALITY QUERY优化技术

    公开(公告)号:US20100235347A1

    公开(公告)日:2010-09-16

    申请号:US12404284

    申请日:2009-03-14

    CPC classification number: G06F17/30463

    Abstract: An exact cardinality query optimization system and method for optimizing a query having a plurality of expressions to obtain a cardinality-optimal query execution plan for the query. Embodiments of the system and method use various techniques to shorten the time necessary to obtain the cardinality-optimal query execution plan, which contains the query execution plan when all cardinalities are exact. Embodiments of the system and method include a covering queries technique that leverages query execution feedback to obtain an unordered subset of relevant expressions for the query, an early termination technique that bounds the cardinality to determine whether the processing can be terminate before each of the expressions are executed, and an expressions ordering technique that finds an ordering of expressions that yields the greatest reduction in time to obtain the cardinality-optimal query execution plan.

    Abstract translation: 一种精确的基数查询优化系统和方法,用于优化具有多个表达式的查询,以获得查询的基数最优查询执行计划。 系统和方法的实施例使用各种技术来缩短获得基数优化查询执行计划所需的时间,当所有基数是精确的时,其包含查询执行计划。 该系统和方法的实施例包括利用查询执行反馈来获取查询的相关表达式的无序子集的覆盖查询技术,限制基数以确定处理是否可以在每个表达式之前终止的提前终止技术是 以及表达式排序技术,其找到产生最大时间缩短以获得基数优化查询执行计划的表达式的排序。

    Scalable lookup-driven entity extraction from indexed document collections
    12.
    发明申请
    Scalable lookup-driven entity extraction from indexed document collections 有权
    从索引文档集合提取可扩展的查找驱动实体

    公开(公告)号:US20090319500A1

    公开(公告)日:2009-12-24

    申请号:US12144675

    申请日:2008-06-24

    CPC classification number: G06F17/30011 G06F17/278

    Abstract: A set of documents is filtered for entity extraction. A list of entity strings is received. A set of token sets that covers the entity strings in the list is determined. An inverted index generated on a first set of documents is queried using the set of token sets to determine a set of document identifiers for a subset of the documents in the first set. A second set of documents identified by the set of document identifiers is retrieved from the first set of documents. The second set of documents is filtered to include one or more documents of the second set that each includes a match with at least one entity string of the list of entity strings. Entity recognition may be performed on the filtered second set of documents.

    Abstract translation: 过滤一组文档进行实体提取。 接收到实体字符串的列表。 确定一组涵盖列表中的实体字符串的令牌集。 使用该组令​​牌查询在第一组文档上生成的反向索引,以确定第一组中的文档的子集的一组文档标识符。 从第一组文档中检索由该组文档标识符标识的第二组文档。 第二组文档被过滤以包括第二组的一个或多个文档,每个文档包括与实体字符串列表的至少一个实体字符串的匹配。 可以对经过滤的第二组文件执行实体识别。

    Query selectivity estimation with confidence interval
    13.
    发明授权
    Query selectivity estimation with confidence interval 有权
    具有置信区间的查询选择性估计

    公开(公告)号:US07636707B2

    公开(公告)日:2009-12-22

    申请号:US10818730

    申请日:2004-04-06

    Abstract: Selectivity estimates are produced that meet a desired confidence threshold. To determine the confidence level of a given selectivity estimate for a query expression, the query expression is evaluated on a sample tuples. A probability density function is derived based on the number of tuples in the sample that satisfy the query expression. The cumulative distribution for the probability density function is solved for the given threshold to determine a selectivity estimate at the given confidence value.

    Abstract translation: 产生满足期望置信阈值的选择性估计。 为了确定查询表达式的给定选择性估计的置信水平,查询表达式将在样本元组上进行求值。 基于满足查询表达式的样本中的元组数量导出概率密度函数。 为给定阈值求解概率密度函数的累积分布,以确定给定置信度值下的选择性估计。

    EXAMPLE-DRIVEN DESIGN OF EFFICIENT RECORD MATCHING QUERIES
    15.
    发明申请
    EXAMPLE-DRIVEN DESIGN OF EFFICIENT RECORD MATCHING QUERIES 有权
    实例 - 有效记录匹配查询的驱动设计

    公开(公告)号:US20080306945A1

    公开(公告)日:2008-12-11

    申请号:US11758202

    申请日:2007-06-05

    CPC classification number: G06F17/30533 G06F17/30495

    Abstract: Example-driven creation of record matching queries. The disclosed architecture employs techniques that exploit the availability of positive (or matching) and negative (non-matching) examples to search through this space and suggest an initial record matching query. The record matching task is modeled as that of designing an operator tree obtained by composing a few primitive operators. This ensures that record matching programs be executable efficiently and scalably over large input relations. The architecture joins records across multiple (e.g., two) relations (e.g., R and S). The architecture exploits the monotonicity property of similarity functions for record matching in the relations, in that, any pair of matching records have a higher similarity value than non-matching record pairs on at least one similarity function.

    Abstract translation: 示例驱动创建记录匹配查询。 所公开的架构采用利用正(或匹配)和否定(不匹配)示例的可用性来搜索该空间并提出初始记录匹配查询的技术。 记录匹配任务被建模为设计通过组合几个原始算子获得的运算符树的记录匹配任务。 这确保了记录匹配程序可以在大的输入关系上有效和可扩展地执行。 该架构通过多个(例如,两个)关系(例如,R和S)连接记录。 该架构利用了关系中记录匹配的相似度函数的单调性,因为任何一对匹配记录具有比至少一个相似度函数上的非匹配记录对更高的相似度值。

    Primitive operator for similarity joins in data cleaning
    16.
    发明授权
    Primitive operator for similarity joins in data cleaning 有权
    数据清理中相似性的原始运算符

    公开(公告)号:US07406479B2

    公开(公告)日:2008-07-29

    申请号:US11352141

    申请日:2006-02-10

    CPC classification number: G06F17/30442 Y10S707/99942 Y10S707/99943

    Abstract: A set similarity join system and method are provided. The system can be employed to facilitate data cleaning based on similarities through the identification of “close” tuples (e.g., records and/or rows). “Closeness” can be is evaluated using a similarity function(s) chosen to suit the domain and/or application. Thus, the system facilitates generic domain-independent data cleansing.The system can be employed with a foundational primitive, the set similarity join (SSJoin) operator, which can be used as a building block to implement a broad variety of notions of similarity (e.g., edit similarity, Jaccard similarity, generalized edit similarity, hamming distance, soundex, etc.) as well as similarity based on co-occurrences. The SSJoin operator can exploit the observation that set overlap can be used effectively to support a variety of similarity functions. The SSJoin operator compares values based on “sets” associated with (or explicitly constructed for) each one of them.

    Abstract translation: 提供了一种集合相似性连接系统和方法。 可以通过识别“关闭”元组(例如,记录和/或行)来基于相似性来促进系统的数据清理。 可以使用选择适合域和/或应用程序的相似性函数来评估“接近度”。 因此,该系统便于通用的域无关数据清理。 该系统可以与基本原语,即相似性连接(SSJoin)运算符一起使用,其可以用作构建块来实现各种各样的相似性概念(例如,编辑相似性,Jaccard相似性,广义编辑相似性,汉明 距离,声音等)以及基于共同出现的相似性。 SSJoin算子可以利用设置重叠的观察结果有效地用于支持各种相似度函数。 SSJoin操作符根据与其中每一个相关联(或明确构建的)的“集合”来比较值。

    Database aggregation query result estimator
    17.
    发明授权
    Database aggregation query result estimator 有权
    数据库聚合查询结果估计器

    公开(公告)号:US07363301B2

    公开(公告)日:2008-04-22

    申请号:US11246355

    申请日:2005-10-07

    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

    Abstract translation: 通过首先识别异常值,聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。 采样数据被外推并加到聚合异常值中,以提供每个聚合查询的估计。 异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。 为异常值创建索引。 离群数据从数据窗口中移除,并单独汇总。 然后对没有异常值的剩余数据进行采样,以提供统计学上相关的样本,然后对其进行聚合和外插,以提供剩余数据的估计。 该采样估计与异常值聚合组合以形成整套数据的估计。

    LOCALIZED MARKETING
    18.
    发明申请
    LOCALIZED MARKETING 审中-公开
    本地营销

    公开(公告)号:US20080005104A1

    公开(公告)日:2008-01-03

    申请号:US11427290

    申请日:2006-06-28

    CPC classification number: G06Q30/02 G06F16/9537

    Abstract: A localized marketing system is disclosed that provides discount offers to users that match merchant criteria including proximity. Further, a system for actively probing populations of users with different parameters and monitoring responses can be employed to collect data for identifying the best discounts and deadlines to offer to users to achieve desired results.Another aspect of the disclosure pertains to web searches and more particularly toward influencing resultant content to increase relevancy. The resultant content can be influenced by reconfiguring a query and/or filtering results based on user location and/or context information (e.g., user characteristics/profile, prior interaction/usage temporal, current events, and third party state/context . . . ). Furthermore, the disclosure provides for query execution on at least a subset of designated web content, for example as specified by a user.

    Abstract translation: 公开了一种本地化的营销系统,其提供与包括邻近度的商家标准匹配的用户的折扣优惠。 此外,可以采用用于主动探测具有不同参数和监测响应的用户群体的系统来收集数据以识别提供给用户以获得期望结果的最佳折扣和期限。 本公开的另一方面涉及网络搜索,并且更具体地涉及影响结果内容以增加相关性。 可以通过基于用户位置和/或上下文信息(例如,用户特征/简档,先前交互/使用时间,当前事件和第三方状态/上下文)重新配置查询和/或过滤结果来影响所得到的内容。 )。 此外,本公开提供了在指定的web内容的至少一个子集上的查询执行,例如由用户指定的。

    Cardinality estimation of joins
    19.
    发明授权
    Cardinality estimation of joins 有权
    连接的基数估计

    公开(公告)号:US07299226B2

    公开(公告)日:2007-11-20

    申请号:US10465148

    申请日:2003-06-19

    Abstract: A method of estimating cardinality of a join of tables using multi-column density values and additionally using coarser density values of a subset of the multi-column density attributes. In one embodiment, the subset of attributes for the coarser densities is a prefix of the set of multi-column density attributes. A number of tuples from each table that participate in the join may be estimated using densities of the subsets. The cardinality of the join can be estimated using the multi-column density for each table and the estimated number of tuples that participate in the join from each table.

    Abstract translation: 使用多列密度值估计表连接的基数的方法,并且另外使用多列密度属性的子集的较粗密度值。 在一个实施例中,用于较粗密度的属性子集是多列密度属性集合的前缀。 可以使用子集的密度来估计参与加入的每个表中的一些元组。 可以使用每个表的多列密度和参与每个表的连接的元组的估计数量来估计连接的基数。

    Constructing database object workload summaries
    20.
    发明授权
    Constructing database object workload summaries 有权
    构建数据库对象工作量摘要

    公开(公告)号:US07299220B2

    公开(公告)日:2007-11-20

    申请号:US10815061

    申请日:2004-03-31

    Abstract: A database object summarization tool is provided that selects a subset of database objects subject to filtering constraints such as a partial order or optimization of some attribute. A dominance primitive filters out tuples that are dominated according to a partial order constraint by another tuple. A representation primitive selects a representative subset of tuples such than an optimization criteria is met.

    Abstract translation: 提供了一种数据库对象摘要工具,该工具选择受过滤约束(如某些属性的部分顺序或优化)的数据库对象的子集。 优势原语过滤掉由另一个元组根据部分顺序约束所主导的元组。 表示基元选择满足优化标准的元组的代表性子集。

Patent Agency Ranking