IDENTIFYING SYNONYMS OF ENTITIES USING WEB SEARCH
    91.
    发明申请
    IDENTIFYING SYNONYMS OF ENTITIES USING WEB SEARCH 审中-公开
    使用WEB搜索识别实体的同步

    公开(公告)号:US20100293179A1

    公开(公告)日:2010-11-18

    申请号:US12465832

    申请日:2009-05-14

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: Identifying synonyms of entities using web search results is disclosed herein. In some aspects, a candidate string of tokens of an entity name is selected as a search term. The search term is transmitted by a server to a search engine, which in turn, transmits search results back to the server after performing a search. The server analyzes the search results, generates a score based on the search results, and then determines a status (synonym or not a synonym) of the candidate string based on the score. In further aspects, additional candidate strings are designated as synonyms or not synonyms based on status of the searched candidate string by using relationships of a lattice formed from all possible candidate strings of the entity name.

    摘要翻译: 本文公开了使用网络搜索结果识别实体的同义词。 在某些方面,选择实体名称的令牌候选字符串作为搜索项。 搜索项由服务器发送到搜索引擎,搜索引擎又在执行搜索之后将搜索结果发送回服务器。 服务器分析搜索结果,根据搜索结果生成分数,然后根据分数确定候选字符串的状态(同义词或不是同义词)。 在另外的方面,通过使用由实体名称的所有可能候选字符串形成的格子的关系,基于搜索到的候选字符串的状态,将附加候选字符串指定为同义词或不是同义词。

    Systems and methods for estimating functional relationships in a database
    92.
    发明授权
    Systems and methods for estimating functional relationships in a database 有权
    用于估计数据库中的功能关系的系统和方法

    公开(公告)号:US07562067B2

    公开(公告)日:2009-07-14

    申请号:US11123901

    申请日:2005-05-06

    IPC分类号: G06F17/30 G06F7/00 G06F17/00

    摘要: A system that facilitates estimating functional relationships associated with one or more columns in a database comprises a sampling component that receives a random sample of records within the database. An estimate generator component calculates an estimate of strength of functional relationships based at least in part upon the received samples. For example, the estimate generator component can calculate an estimate of strength of a column as a key column based at least in part upon the received samples.

    摘要翻译: 便于估计与数据库中的一个或多个列相关联的功能关系的系统包括接收数据库内的记录的随机抽样的采样组件。 估计生成器组件至少部分地基于所接收的样本来计算功能关系的强度的估计。 例如,估计生成器组件可以至少部分地基于所接收的样本来计算作为关键列的列的强度的估计。

    Query progress estimation
    93.
    发明授权
    Query progress estimation 有权
    查询进度估计

    公开(公告)号:US07493337B2

    公开(公告)日:2009-02-17

    申请号:US10813963

    申请日:2004-03-31

    IPC分类号: G06F7/00 G06F17/00 G06F17/30

    摘要: A query progress indicator that provides an indication to a user of the progress of a query being executed on a database. The indication of the progress of the query allows the user to decide whether the query should be allowed to complete or should be aborted. One method that may be used to estimate the progress of a query that is being executed on a database defines a model of work performed during execution of a query. The total amount of work that will be performed during execution of the query is estimated according to the model. The amount of work performed at a given point during execution of the query is estimated according to the model. The progress of the query is estimated using the estimated amount of work at the given point in time and the estimated total amount of work. This estimated progress of query execution may be provided to the user.

    摘要翻译: 查询进度指示符,向用户提供在数据库上执行的查询的进度的指示。 查询进度的指示允许用户决定是否允许查询完成或应该被中止。 可用于估计在数据库上执行的查询的进度的一种方法定义了在查询执行过程中执行的工作模型。 根据模型估计执行查询期间执行的总工作量。 在执行查询期间在给定点执行的工作量根据模型进行估计。 查询的进度使用在给定时间点的估计工作量和估计的总工作量来估计。 可以向用户提供该估计的查询执行进度。

    Ranking database query results using probabilistic models from information retrieval
    94.
    发明授权
    Ranking database query results using probabilistic models from information retrieval 失效
    使用信息检索的概率模型对数据库查询结果进行排序

    公开(公告)号:US07383262B2

    公开(公告)日:2008-06-03

    申请号:US10879450

    申请日:2004-06-29

    IPC分类号: G06F7/00

    摘要: A system and methods rank results of database queries. An automated approach for ranking database query results is disclosed that leverages data and workload statistics and associations. Ranking functions are based upon the principles of probabilistic models from Information Retrieval that are adapted for structured data. The ranking functions are encoded into an intermediate knowledge representation layer. The system is generic, as the ranking functions can be further customized for different applications. Benefits of the disclosed system and methods include the use of adapted probabilistic information retrieval (PIR) techniques that leverage relational/structured data, such as columns, to provide natural groupings of data values. This permits the inference and use of pair-wise associations between data values across columns, which are usually not possible with text data.

    摘要翻译: 系统和方法对数据库查询的结果进行排序。 披露了一种用于排名数据库查询结果的自动化方法,它利用数据和工作量统计信息和关联。 排名函数基于适用于结构化数据的信息检索的概率模型的原理。 排序函数被编码为中间知识表示层。 该系统是通用的,因为排序功能可以针对不同的应用进一步定制。 所公开的系统和方法的优点包括使用适应的概率信息检索(PIR)技术来利用诸如列的关系/结构化数据来提供数据值的自然分组。 这允许推断和使用跨列之间的数据值之间的成对关联,这通常不可能与文本数据。

    Compressing database workloads
    96.
    发明授权
    Compressing database workloads 有权
    压缩数据库工作负载

    公开(公告)号:US07293036B2

    公开(公告)日:2007-11-06

    申请号:US11008335

    申请日:2004-12-08

    IPC分类号: G06F17/30

    摘要: Relational database applications such as index selection, histogram tuning, approximate query processing, and statistics selection have recognized the importance of leveraging workloads. Often these applications are presented with large workloads, i.e., a set of SQL DML statements, as input. A key factor affecting the scalability of such applications is the size of the workload. The invention concerns workload compression which helps improve the scalability of such applications. The exemplary embodiment is broadly applicable to a variety of workload-driven applications, while allowing for incorporation of application specific knowledge. The process is described in detail in the context of two workload-driven applications: index selection and approximate query processing.

    摘要翻译: 诸如索引选择,直方图调整,近似查询处理和统计选择等关系数据库应用程序已经认识到利用工作负载的重要性。 通常,这些应用程序具有大的工作负载,即一组SQL DML语句作为输入。 影响这些应用程序可扩展性的关键因素是工作负载的大小。 本发明涉及工作负载压缩,这有助于提高这种应用的可扩展性。 该示例性实施例广泛地适用于各种工作负载驱动的应用,同时允许结合应用特定的知识。 该过程在两个工作负载驱动的应用程序的上下文中进行了详细描述:索引选择和近似查询处理。

    Optimization based method for estimating the results of aggregate queries
    97.
    发明授权
    Optimization based method for estimating the results of aggregate queries 失效
    用于估计聚合查询结果的基于优化的方法

    公开(公告)号:US07281007B2

    公开(公告)日:2007-10-09

    申请号:US10935803

    申请日:2004-09-08

    IPC分类号: G06F17/30

    摘要: A method for estimating the result of a query on a database having data records arranged in tables. The database has an expected workload that includes a set of queries that can be executed on the database. An expected workload is derived including a set of queries that can be executed on the database. A sample is constructed by selecting data records for inclusion in the sample in a manner that minimizes an estimation error when the data records are acted upon by a query in the expected workload to provide an expected workload to provide an expected result. The query accesses the sample and is executed on the sample, returning an estimated query result. The expected workload can be constructed by specifying a degree of overlap between records selected by queries in the given workload and records selected by queries in the expected workload.

    摘要翻译: 一种用于估计具有以表格排列的数据记录的数据库的查询结果的方法。 数据库具有预期的工作负载,其中包括可在数据库上执行的一组查询。 导出预期的工作负载,包括可以在数据库上执行的一组查询。 通过在以下方式选择数据记录来构建样本:将数据记录在期望的工作负载中由查询作用在最小化估计误差的方式,以提供期望的工作量以提供期望的结果,来选择包含在样本中的数据记录。 查询访问样本并在样本上执行,返回估计的查询结果。 可以通过指定给定工作负载中的查询选择的记录与预期工作负载中的查询所选择的记录之间的重叠程度来构建预期的工作负载。

    Robust cardinality and cost estimation for skyline operator
    98.
    发明申请
    Robust cardinality and cost estimation for skyline operator 有权
    天际线运营商的鲁棒基数和成本估算

    公开(公告)号:US20070198439A1

    公开(公告)日:2007-08-23

    申请号:US11357665

    申请日:2006-02-17

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30469 G06Q30/0283

    摘要: The claimed subject matter relates to incorporating a skyline operator within a relational database engine, and more particularly to a database engine that utilizes novel techniques to determine the lowest cost of generating the skyline produced by the skyline operator. The database engine receives queries and associated preferences and based on a cardinality estimate and a cost estimate an appropriate skyline generating technique is utilized to produce a skyline representative of the received queries and its associated preferences.

    摘要翻译: 所要求保护的主题涉及在关系数据库引擎内并入天际线运算符,更具体地涉及利用新技术来确定由天际线运算符产生的天际线产生的最低成本的数据库引擎。 数据库引擎接收查询和相关联的偏好,并且基于基数估计和成本估计,使用适当的地平线生成技术来产生所接收的查询及其相关联的偏好的天际线代表。

    Automatically ranking answers to database queries
    99.
    发明授权
    Automatically ranking answers to database queries 失效
    自动排列数据库查询的答案

    公开(公告)号:US07251648B2

    公开(公告)日:2007-07-31

    申请号:US10186027

    申请日:2002-06-28

    IPC分类号: G06F17/30

    摘要: A method for automatically ranking database records by relevance to a given query. A similarity function is derived from data in the database and/or queries in a workload. The derrived similarity function is applied to a given query and records it in the database to rank the records. The records are returned in a ranked order.

    摘要翻译: 一种用于通过与给定查询相关的方式自动对数据库记录进行排序的方法。 相似度函数从数据库中的数据和/或工作负载中的查询中导出。 被引用的相似性函数被应用于给定的查询并将其记录在数据库中以对记录进行排序。 记录以排序顺序返回。

    Query optimization by sub-plan memoization
    100.
    发明授权
    Query optimization by sub-plan memoization 有权
    通过子计划回忆查询优化

    公开(公告)号:US07240044B2

    公开(公告)日:2007-07-03

    申请号:US10941113

    申请日:2004-09-15

    IPC分类号: G06F17/30 G06F7/00

    摘要: Database system query optimizers use several techniques such as histograms and sampling to estimate the result sizes of operators and sub-plans (operator trees) and the number of distinct values in their outputs. Instead of estimates, the invention uses the exact actual values of the result sizes and the number of distinct values in the outputs of sub-plans encountered by the optimizer. This is achieved by optimizing the query in phases. In each phase, newly encountered sub-plans are recorded for which result size and/or distinct value estimates are required. These sub-plans are executed at the end of the phase to determine their actual result sizes and the actual number of distinct values in their outputs. In subsequent phases, the optimizer uses these actual values when it encounters the same sub-plan again.

    摘要翻译: 数据库系统查询优化器使用几种技术,如直方图和抽样来估计运算符和子计划(运算符树)的结果大小以及输出中不同值的数量。 代替估计,本发明使用优化器遇到的子计划的输出中的结果大小和不同值的确切实际值。 这是通过分阶段优化查询来实现的。 在每个阶段,记录新遇到的子计划,为此需要哪个结果大小和/或不同的价值估计值。 这些子计划将在阶段结束时执行,以确定其实际结果大小和其输出中不同值的实际数量。 在后续阶段,当它再次遇到相同的子计划时,优化器将使用这些实际值。