专利检索 ap:("Nicolas Bruno" OR "Surajit Chaudhuri" OR "Dilys Thomas") AND inv:"Surajit Chaudhuri" 第 10 页

91.

发明申请
IDENTIFYING SYNONYMS OF ENTITIES USING WEB SEARCH 审中-公开
标题翻译：使用WEB搜索识别实体的同步

公开(公告)号：US20100293179A1

公开(公告)日：2010-11-18

申请号：US12465832

申请日：2009-05-14

申请人： Surajit Chaudhuri , Venkatesh Ganti , Dong Xin

发明人： Surajit Chaudhuri , Venkatesh Ganti , Dong Xin

IPC分类号： G06F17/30

CPC分类号： G06F16/951

摘要： Identifying synonyms of entities using web search results is disclosed herein. In some aspects, a candidate string of tokens of an entity name is selected as a search term. The search term is transmitted by a server to a search engine, which in turn, transmits search results back to the server after performing a search. The server analyzes the search results, generates a score based on the search results, and then determines a status (synonym or not a synonym) of the candidate string based on the score. In further aspects, additional candidate strings are designated as synonyms or not synonyms based on status of the searched candidate string by using relationships of a lattice formed from all possible candidate strings of the entity name.

摘要翻译： 本文公开了使用网络搜索结果识别实体的同义词。在某些方面，选择实体名称的令牌候选字符串作为搜索项。搜索项由服务器发送到搜索引擎，搜索引擎又在执行搜索之后将搜索结果发送回服务器。服务器分析搜索结果，根据搜索结果生成分数，然后根据分数确定候选字符串的状态（同义词或不是同义词）。在另外的方面，通过使用由实体名称的所有可能候选字符串形成的格子的关系，基于搜索到的候选字符串的状态，将附加候选字符串指定为同义词或不是同义词。

92.

发明授权
Systems and methods for estimating functional relationships in a database 有权
标题翻译：用于估计数据库中的功能关系的系统和方法

公开(公告)号：US07562067B2

公开(公告)日：2009-07-14

申请号：US11123901

申请日：2005-05-06

申请人： Surajit Chaudhuri , Venkatesh Ganti , Kaushik Shriraghav

发明人： Surajit Chaudhuri , Venkatesh Ganti , Kaushik Shriraghav

IPC分类号： G06F17/30 , G06F7/00 , G06F17/00

CPC分类号： G06F17/30536 , Y10S707/99932

摘要： A system that facilitates estimating functional relationships associated with one or more columns in a database comprises a sampling component that receives a random sample of records within the database. An estimate generator component calculates an estimate of strength of functional relationships based at least in part upon the received samples. For example, the estimate generator component can calculate an estimate of strength of a column as a key column based at least in part upon the received samples.

摘要翻译： 便于估计与数据库中的一个或多个列相关联的功能关系的系统包括接收数据库内的记录的随机抽样的采样组件。估计生成器组件至少部分地基于所接收的样本来计算功能关系的强度的估计。例如，估计生成器组件可以至少部分地基于所接收的样本来计算作为关键列的列的强度的估计。

93.

发明授权
Query progress estimation 有权
标题翻译：查询进度估计

公开(公告)号：US07493337B2

公开(公告)日：2009-02-17

申请号：US10813963

申请日：2004-03-31

申请人： Surajit Chaudhuri , Vivek Narasayya , Ravishankar Ramamurthy

发明人： Surajit Chaudhuri , Vivek Narasayya , Ravishankar Ramamurthy

IPC分类号： G06F7/00 , G06F17/00 , G06F17/30

CPC分类号： G06F17/30522 , G06F17/30306 , Y10S707/99932 , Y10S707/99945 , Y10S707/99948

摘要： A query progress indicator that provides an indication to a user of the progress of a query being executed on a database. The indication of the progress of the query allows the user to decide whether the query should be allowed to complete or should be aborted. One method that may be used to estimate the progress of a query that is being executed on a database defines a model of work performed during execution of a query. The total amount of work that will be performed during execution of the query is estimated according to the model. The amount of work performed at a given point during execution of the query is estimated according to the model. The progress of the query is estimated using the estimated amount of work at the given point in time and the estimated total amount of work. This estimated progress of query execution may be provided to the user.

摘要翻译： 查询进度指示符，向用户提供在数据库上执行的查询的进度的指示。查询进度的指示允许用户决定是否允许查询完成或应该被中止。可用于估计在数据库上执行的查询的进度的一种方法定义了在查询执行过程中执行的工作模型。根据模型估计执行查询期间执行的总工作量。在执行查询期间在给定点执行的工作量根据模型进行估计。查询的进度使用在给定时间点的估计工作量和估计的总工作量来估计。可以向用户提供该估计的查询执行进度。

94.

发明授权
Ranking database query results using probabilistic models from information retrieval 失效
标题翻译：使用信息检索的概率模型对数据库查询结果进行排序

公开(公告)号：US07383262B2

公开(公告)日：2008-06-03

申请号：US10879450

申请日：2004-06-29

申请人： Gautam Das , Surajit Chaudhuri , Vagelis Hristidis , Gerhard Weikum

发明人： Gautam Das , Surajit Chaudhuri , Vagelis Hristidis , Gerhard Weikum

IPC分类号： G06F7/00

CPC分类号： G06Q30/0603 , G06Q50/16 , Y10S707/99937

摘要： A system and methods rank results of database queries. An automated approach for ranking database query results is disclosed that leverages data and workload statistics and associations. Ranking functions are based upon the principles of probabilistic models from Information Retrieval that are adapted for structured data. The ranking functions are encoded into an intermediate knowledge representation layer. The system is generic, as the ranking functions can be further customized for different applications. Benefits of the disclosed system and methods include the use of adapted probabilistic information retrieval (PIR) techniques that leverage relational/structured data, such as columns, to provide natural groupings of data values. This permits the inference and use of pair-wise associations between data values across columns, which are usually not possible with text data.

摘要翻译： 系统和方法对数据库查询的结果进行排序。披露了一种用于排名数据库查询结果的自动化方法，它利用数据和工作量统计信息和关联。排名函数基于适用于结构化数据的信息检索的概率模型的原理。排序函数被编码为中间知识表示层。该系统是通用的，因为排序功能可以针对不同的应用进一步定制。所公开的系统和方法的优点包括使用适应的概率信息检索（PIR）技术来利用诸如列的关系/结构化数据来提供数据值的自然分组。这允许推断和使用跨列之间的数据值之间的成对关联，这通常不可能与文本数据。

95.

发明申请
VISUAL AND MULTI-DIMENSIONAL SEARCH 失效
标题翻译：视觉和多维搜索

公开(公告)号：US20080005091A1

公开(公告)日：2008-01-03

申请号：US11427303

申请日：2006-06-28

申请人： Stephen Lawler , Eric J. Horvitz , Joshua T. Goodman , Anoop Gupta , Christopher A. Meek , Eric D. Brill , Gary W. Flake , Ramez Naam , Surajit Chaudhuri , Oliver Hurst-Hiller

发明人： Stephen Lawler , Eric J. Horvitz , Joshua T. Goodman , Anoop Gupta , Christopher A. Meek , Eric D. Brill , Gary W. Flake , Ramez Naam , Surajit Chaudhuri , Oliver Hurst-Hiller

IPC分类号： G06F17/30

CPC分类号： G06F17/30864 , G06F17/30592 , Y10S707/913

摘要： A system that can analyze a multi-dimensional input thereafter establishing a search query based upon extracted features from the input. In a particular example, an image can be used as an input to a search mechanism. Pattern recognition and image analysis can be applied to the image thereafter establishing a search query that corresponds to features extracted from the image input. The system can also facilitate indexing multi-dimensional searchable items thereby making them available to be retrieved as results to a search query. More particularly, the system can employ text analysis, pattern and/or speech recognition mechanisms to extract features from searchable items. These extracted features can be employed to index the searchable items.

摘要翻译： 一种能够基于从输入中提取的特征来建立搜索查询的分析多维输入的系统。在特定示例中，图像可以用作搜索机制的输入。模式识别和图像分析可以应用于图像，然后建立对应于从图像输入提取的特征的搜索查询。该系统还可以方便索引多维可搜索项目，从而使得它们可以作为结果被检索到搜索查询。更具体地，系统可以采用文本分析，模式和/或语音识别机制来从可搜索项目中提取特征。这些提取的特征可用于索引可搜索的项目。

96.

发明授权
Compressing database workloads 有权
标题翻译：压缩数据库工作负载

公开(公告)号：US07293036B2

公开(公告)日：2007-11-06

申请号：US11008335

申请日：2004-12-08

申请人： Surajit Chaudhuri , Ashish Kumar Gupta , Vivek Narasayya

发明人： Surajit Chaudhuri , Ashish Kumar Gupta , Vivek Narasayya

IPC分类号： G06F17/30

CPC分类号： G06F17/30536 , G06F17/30306 , G06F17/30312 , Y10S706/917 , Y10S707/99932 , Y10S707/99942 , Y10S707/99945

摘要： Relational database applications such as index selection, histogram tuning, approximate query processing, and statistics selection have recognized the importance of leveraging workloads. Often these applications are presented with large workloads, i.e., a set of SQL DML statements, as input. A key factor affecting the scalability of such applications is the size of the workload. The invention concerns workload compression which helps improve the scalability of such applications. The exemplary embodiment is broadly applicable to a variety of workload-driven applications, while allowing for incorporation of application specific knowledge. The process is described in detail in the context of two workload-driven applications: index selection and approximate query processing.

摘要翻译： 诸如索引选择，直方图调整，近似查询处理和统计选择等关系数据库应用程序已经认识到利用工作负载的重要性。通常，这些应用程序具有大的工作负载，即一组SQL DML语句作为输入。影响这些应用程序可扩展性的关键因素是工作负载的大小。本发明涉及工作负载压缩，这有助于提高这种应用的可扩展性。该示例性实施例广泛地适用于各种工作负载驱动的应用，同时允许结合应用特定的知识。该过程在两个工作负载驱动的应用程序的上下文中进行了详细描述：索引选择和近似查询处理。

97.

发明授权
Optimization based method for estimating the results of aggregate queries 失效
标题翻译：用于估计聚合查询结果的基于优化的方法

公开(公告)号：US07281007B2

公开(公告)日：2007-10-09

申请号：US10935803

申请日：2004-09-08

申请人： Surajit Chaudhuri , Vivek Narasayya , Gantam Das

发明人： Surajit Chaudhuri , Vivek Narasayya , Gantam Das

IPC分类号： G06F17/30

CPC分类号： G06F17/30536 , G06F17/30489 , Y10S707/99933 , Y10S707/99934 , Y10S707/99936 , Y10S707/99937 , Y10S707/99943 , Y10S707/99945

摘要： A method for estimating the result of a query on a database having data records arranged in tables. The database has an expected workload that includes a set of queries that can be executed on the database. An expected workload is derived including a set of queries that can be executed on the database. A sample is constructed by selecting data records for inclusion in the sample in a manner that minimizes an estimation error when the data records are acted upon by a query in the expected workload to provide an expected workload to provide an expected result. The query accesses the sample and is executed on the sample, returning an estimated query result. The expected workload can be constructed by specifying a degree of overlap between records selected by queries in the given workload and records selected by queries in the expected workload.

摘要翻译： 一种用于估计具有以表格排列的数据记录的数据库的查询结果的方法。数据库具有预期的工作负载，其中包括可在数据库上执行的一组查询。导出预期的工作负载，包括可以在数据库上执行的一组查询。通过在以下方式选择数据记录来构建样本：将数据记录在期望的工作负载中由查询作用在最小化估计误差的方式，以提供期望的工作量以提供期望的结果，来选择包含在样本中的数据记录。查询访问样本并在样本上执行，返回估计的查询结果。可以通过指定给定工作负载中的查询选择的记录与预期工作负载中的查询所选择的记录之间的重叠程度来构建预期的工作负载。

98.

发明申请
Robust cardinality and cost estimation for skyline operator 有权
标题翻译：天际线运营商的鲁棒基数和成本估算

公开(公告)号：US20070198439A1

公开(公告)日：2007-08-23

申请号：US11357665

申请日：2006-02-17

申请人： Kaushik Shriraghav , Surajit Chaudhuri , Nilesh Dalvi

发明人： Kaushik Shriraghav , Surajit Chaudhuri , Nilesh Dalvi

IPC分类号： G06F17/00

CPC分类号： G06F17/30469 , G06Q30/0283

摘要： The claimed subject matter relates to incorporating a skyline operator within a relational database engine, and more particularly to a database engine that utilizes novel techniques to determine the lowest cost of generating the skyline produced by the skyline operator. The database engine receives queries and associated preferences and based on a cardinality estimate and a cost estimate an appropriate skyline generating technique is utilized to produce a skyline representative of the received queries and its associated preferences.

摘要翻译： 所要求保护的主题涉及在关系数据库引擎内并入天际线运算符，更具体地涉及利用新技术来确定由天际线运算符产生的天际线产生的最低成本的数据库引擎。数据库引擎接收查询和相关联的偏好，并且基于基数估计和成本估计，使用适当的地平线生成技术来产生所接收的查询及其相关联的偏好的天际线代表。

99.

发明授权
Automatically ranking answers to database queries 失效
标题翻译：自动排列数据库查询的答案

公开(公告)号：US07251648B2

公开(公告)日：2007-07-31

申请号：US10186027

申请日：2002-06-28

申请人： Surajit Chaudhuri , Gautam Das , Aris Gionis

发明人： Surajit Chaudhuri , Gautam Das , Aris Gionis

IPC分类号： G06F17/30

CPC分类号： G06F17/3053 , Y10S707/99933 , Y10S707/99935

摘要： A method for automatically ranking database records by relevance to a given query. A similarity function is derived from data in the database and/or queries in a workload. The derrived similarity function is applied to a given query and records it in the database to rank the records. The records are returned in a ranked order.

摘要翻译： 一种用于通过与给定查询相关的方式自动对数据库记录进行排序的方法。相似度函数从数据库中的数据和/或工作负载中的查询中导出。被引用的相似性函数被应用于给定的查询并将其记录在数据库中以对记录进行排序。记录以排序顺序返回。

100.

发明授权
Query optimization by sub-plan memoization 有权
标题翻译：通过子计划回忆查询优化

公开(公告)号：US07240044B2

公开(公告)日：2007-07-03

申请号：US10941113

申请日：2004-09-15

申请人： Surajit Chaudhuri , Ashraf I Aboulnaga

发明人： Surajit Chaudhuri , Ashraf I Aboulnaga

IPC分类号： G06F17/30 , G06F7/00

CPC分类号： G06F17/30469 , Y10S707/99932 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935 , Y10S707/99936 , Y10S707/99943

摘要： Database system query optimizers use several techniques such as histograms and sampling to estimate the result sizes of operators and sub-plans (operator trees) and the number of distinct values in their outputs. Instead of estimates, the invention uses the exact actual values of the result sizes and the number of distinct values in the outputs of sub-plans encountered by the optimizer. This is achieved by optimizing the query in phases. In each phase, newly encountered sub-plans are recorded for which result size and/or distinct value estimates are required. These sub-plans are executed at the end of the phase to determine their actual result sizes and the actual number of distinct values in their outputs. In subsequent phases, the optimizer uses these actual values when it encounters the same sub-plan again.

摘要翻译： 数据库系统查询优化器使用几种技术，如直方图和抽样来估计运算符和子计划（运算符树）的结果大小以及输出中不同值的数量。代替估计，本发明使用优化器遇到的子计划的输出中的结果大小和不同值的确切实际值。这是通过分阶段优化查询来实现的。在每个阶段，记录新遇到的子计划，为此需要哪个结果大小和/或不同的价值估计值。这些子计划将在阶段结束时执行，以确定其实际结果大小和其输出中不同值的实际数量。在后续阶段，当它再次遇到相同的子计划时，优化器将使用这些实际值。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类