专利检索 ap:("Arvind Arasu" OR "Parag Agrawal" OR "Kaushik Shriraghav") AND inv:"Kaushik Shriraghav" 第 1 页

1.

发明授权
Efficient indexing of error tolerant set containment 有权
标题翻译：有效的索引错误容错集遏制

公开(公告)号：US08606771B2

公开(公告)日：2013-12-10

申请号：US12973909

申请日：2010-12-21

申请人： Arvind Arasu , Parag Agrawal , Kaushik Shriraghav

发明人： Arvind Arasu , Parag Agrawal , Kaushik Shriraghav

IPC分类号： G06F7/00 , G06F17/30

CPC分类号： G06F17/30336

摘要： The claimed subject matter provides a method and a system for the efficient indexing of error tolerant set containment. An exemplary method comprises obtaining a frequency threshold and a query set. All tokens or token sets within the query set are determined, and then all minimal infrequent tokens or all minimal infrequent tokens sets of data records are found and used to build an index. The minimal infrequent tokens or minimal infrequent tokensets are processed in a fixed order, and then a collection of signatures for each minimal infrequent token or token set is determined.

摘要翻译： 所要求保护的主题提供了用于有效地索引误差容限集的方法和系统。一种示例性方法包括获得频率阈值和查询集。确定查询集中的所有令牌或令牌集，然后找到所有最小的不频繁令牌或所有最小的不频繁令牌数据记录集，并用于构建索引。以固定的顺序处理最小的不频繁令牌或最小不频繁的令牌，然后确定每个最小不频繁令牌或令牌集的签名集合。

2.

发明申请
EFFICIENT INDEXING OF ERROR TOLERANT SET CONTAINMENT 有权
标题翻译：有效的索引错误容错集

公开(公告)号：US20120158696A1

公开(公告)日：2012-06-21

申请号：US12973909

申请日：2010-12-21

申请人： Arvind Arasu , Parag Agrawal , Kaushik Shriraghav

发明人： Arvind Arasu , Parag Agrawal , Kaushik Shriraghav

IPC分类号： G06F17/30

CPC分类号： G06F17/30336

摘要： The claimed subject matter provides a method and a system for the efficient indexing of error tolerant set containment. An exemplary method comprises obtaining a frequency threshold and a query set. All tokens or token sets within the query set are determined, and then all minimal infrequent tokens or all minimal infrequent tokens sets of data records are found and used to build an index. The minimal infrequent tokens or minimal infrequent tokensets are processed in a fixed order, and then a collection of signatures for each minimal infrequent token or token set is determined.

摘要翻译： 所要求保护的主题提供了用于有效地索引误差容限集的方法和系统。一种示例性方法包括获得频率阈值和查询集。确定查询集中的所有令牌或令牌集，然后找到所有最小的不频繁令牌或所有最小的不频繁令牌数据记录集，并用于构建索引。以固定的顺序处理最小的不频繁令牌或最小不频繁的令牌，然后确定每个最小不频繁令牌或令牌集的签名集合。

3.

发明授权
Efficient exact set similarity joins 有权
标题翻译：有效的精确集合相似性连接

公开(公告)号：US07865505B2

公开(公告)日：2011-01-04

申请号：US11668870

申请日：2007-01-30

申请人： Arvind Arasu , Venkatesh Ganti , Kaushik Shriraghav

发明人： Arvind Arasu , Venkatesh Ganti , Kaushik Shriraghav

IPC分类号： G06F7/00 , G06F17/30

CPC分类号： G06F17/30498 , G06F17/30533

摘要： A machine implemented system and method that efficiently facilitates and effectuates exact similarity joins between collections of sets. The system and method obtains a collection of sets and a threshold value from an interface, and based at least in part on an identifiable similarity, such as an overlap or intersection, between the collection of sets the analysis component generates and outputs a candidate pair that at least equals or exceeds the threshold value.

摘要翻译： 一种机器实现的系统和方法，其有效地促进并实现集合集合之间的精确相似性连接。系统和方法从接口获得集合集合和阈值，并且至少部分地基于分析组件生成的集合集合之间的可识别相似性（例如重叠或交集）并输出候选对，至少等于或超过阈值。

4.

发明申请
SYNTHETIC DATA GENERATION 审中-公开
标题翻译：合成数据生成

公开(公告)号：US20120330880A1

公开(公告)日：2012-12-27

申请号：US13166831

申请日：2011-06-23

申请人： Arvind Arasu , Kaushik Shriraghav , Jian Li

发明人： Arvind Arasu , Kaushik Shriraghav , Jian Li

IPC分类号： G06F17/30 , G06N5/02

CPC分类号： G06F16/24544

摘要： The claimed subject matter provides a method for data generation. The method includes identifying a generative probability distribution based on one or more cardinality constraints for populating a database table. The method also includes selecting one or more values for a corresponding one or more attributes in the database table based on the generative probability distribution and the cardinality constraints. Additionally, the method includes generating a tuple for the database table. The tuple comprises the one or more values.

摘要翻译： 所要求保护的主题提供了用于数据生成的方法。该方法包括基于用于填充数据库表的一个或多个基数约束识别生成概率分布。该方法还包括基于生成概率分布和基数约束来选择数据库表中对应的一个或多个属性的一个或多个值。另外，该方法包括生成数据库表的元组。元组包含一个或多个值。

5.

发明申请
DESIGNING RECORD MATCHING QUERIES UTILIZING EXAMPLES 有权
标题翻译：设计记录匹配问题应用实例

公开(公告)号：US20070294221A1

公开(公告)日：2007-12-20

申请号：US11424191

申请日：2006-06-14

申请人： Bee-Chung Chen , Venkatesh Ganti , Kaushik Shriraghav

发明人： Bee-Chung Chen , Venkatesh Ganti , Kaushik Shriraghav

IPC分类号： G06F17/30

CPC分类号： G06F17/30489 , Y10S707/99933 , Y10S707/99934

摘要： The subject disclosure pertains to a powerful and flexible framework for record matching. The framework facilitates design of a record matching query or package composed of a set of well-defined primitive operators (e.g., relational, data cleaning . . . ), which can ultimately be executed to match records. To assist design of such packages, a learning technique based on examples is provided. More specifically, a set of matching and non-matching record pairs can be input and employed to facilitate automatic package generation. A generated package can subsequently be transformed manually and/or automatically into a semantically equivalent form optimized for execution.

摘要翻译： 主题公开涉及用于记录匹配的强大且灵活的框架。该框架便于设计由一组明确定义的原始运算符（例如，关系数据清理...）组成的记录匹配查询或包，其最终可以被执行以匹配记录。为了协助这样的包装的设计，提供了基于示例的学习技术。更具体地，可以输入并采用一组匹配和非匹配记录对来促进自动包装生成。生成的包可以随后被手动和/或自动地变换成为执行而优化的语义上等同的形式。

6.

发明授权
Robust cardinality and cost estimation for skyline operator 有权
标题翻译：天际线运营商的鲁棒基数和成本估算

公开(公告)号：US07707207B2

公开(公告)日：2010-04-27

申请号：US11357665

申请日：2006-02-17

申请人： Kaushik Shriraghav , Surajit Chaudhuri , Nilesh N. Dalvi

发明人： Kaushik Shriraghav , Surajit Chaudhuri , Nilesh N. Dalvi

IPC分类号： G06F17/30 , G06F15/16

CPC分类号： G06F17/30469 , G06Q30/0283

摘要： The claimed subject matter relates to incorporating a skyline operator within a relational database engine, and more particularly to a database engine that utilizes novel techniques to determine the lowest cost of generating the skyline produced by the skyline operator. The database engine receives queries and associated preferences and, based on a cardinality estimate and a cost estimate, an appropriate skyline generating technique is utilized to produce a skyline representative of the received queries and its associated preferences.

摘要翻译： 所要求保护的主题涉及在关系数据库引擎内并入天际线运算符，更具体地涉及利用新技术来确定由天际线运算符产生的天际线产生的最低成本的数据库引擎。数据库引擎接收查询和相关联的偏好，并且基于基数估计和成本估计，利用适当的地平线生成技术来产生所接收的查询及其相关联的偏好的天际线。

7.

发明授权
Designing record matching queries utilizing examples 有权
标题翻译：使用示例设计记录匹配查询

公开(公告)号：US07634464B2

公开(公告)日：2009-12-15

申请号：US11424191

申请日：2006-06-14

申请人： Bee-Chung Chen , Venkatesh Ganti , Kaushik Shriraghav

发明人： Bee-Chung Chen , Venkatesh Ganti , Kaushik Shriraghav

IPC分类号： G06F17/30 , G06F7/00

CPC分类号： G06F17/30489 , Y10S707/99933 , Y10S707/99934

摘要： The subject disclosure pertains to a powerful and flexible framework for record matching. The framework facilitates design of a record matching query or package composed of a set of well-defined primitive operators (e.g., relational, data cleaning . . . ), which can ultimately be executed to match records. To assist design of such packages, a learning technique based on examples is provided. More specifically, a set of matching and non-matching record pairs can be input and employed to facilitate automatic package generation. A generated package can subsequently be transformed manually and/or automatically into a semantically equivalent form optimized for execution.

摘要翻译： 主题公开涉及用于记录匹配的强大且灵活的框架。该框架便于设计由一组明确定义的原始运算符（例如，关系数据清理...）组成的记录匹配查询或包，其最终可以被执行以匹配记录。为了协助这样的包装的设计，提供了基于示例的学习技术。更具体地，可以输入并采用一组匹配和非匹配记录对来促进自动包装生成。生成的包可以随后被手动和/或自动地变换成为执行而优化的语义上等同的形式。

8.

发明申请
MINIMAL DIFFERENCE QUERY AND VIEW MATCHING 审中-公开
标题翻译：最小差异查询和查看匹配

公开(公告)号：US20070192297A1

公开(公告)日：2007-08-16

申请号：US11558029

申请日：2006-11-09

申请人： Kaushik Shriraghav , Venkatesh Ganti , Xin Dong

发明人： Kaushik Shriraghav , Venkatesh Ganti , Xin Dong

IPC分类号： G06F17/30

CPC分类号： G06F16/24535 , Y10S707/99932 , Y10S707/99933 , Y10S707/99934

摘要： The subject disclosure pertains to efficient computation of the difference between queries by exploiting commonality between them. A minimal difference query (MDQ) is generated that roughly corresponds to removal of as many joins as possible while still accurately representing the query difference. The minimal difference can be employed to further substantially the scope of view matching where a query is not wholly subsumed by a view. Additionally, the minimal difference query can be employed as an analytical tool in various contexts.

摘要翻译： 本发明涉及通过利用它们之间的共性来有效地计算查询之间的差异。生成最小差异查询（MDQ），大致对应于删除尽可能多的连接，同时仍准确地表示查询差异。可以使用最小差异来进一步实质地观察视图匹配的范围，其中查询未被完全包含在视图中。另外，最小差异查询可以用作各种上下文中的分析工具。

9.

发明授权
Techniques for estimating progress of database queries 有权
标题翻译：估计数据库查询进度的技术

公开(公告)号：US07454407B2

公开(公告)日：2008-11-18

申请号：US11149968

申请日：2005-06-10

申请人： Surajit Chaudhuri , Ravishankar Ramamurthy , Kaushik Shriraghav

发明人： Surajit Chaudhuri , Ravishankar Ramamurthy , Kaushik Shriraghav

IPC分类号： G06F7/00

CPC分类号： G06F17/30522 , G06F17/30474 , Y10S707/99932 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935

摘要： Techniques for estimating the progress of database queries are described herein. In a first implementation, a respective lower-bound parameter is associated with each node in an operator tree that representing a given database query, and the progress of the database query at a given point is estimated based upon the lower-bound parameters. In a second implementation, the progress of the query is estimated by associating respective lower-bound and upper-bound parameters with each node in the operator tree. The progress of the query at the given point is then estimated based on the lower-bound and upper-bound parameters.

摘要翻译： 本文描述了用于估计数据库查询的进度的技术。在第一实现中，相应的下限参数与表示给定数据库查询的运算符树中的每个节点相关联，并且基于下限参数来估计给定点处的数据库查询的进度。在第二个实现中，通过将相应的下限和上限参数与运算符树中的每个节点相关联来估计查询的进度。然后，基于下限和上限参数估计给定点处的查询进度。

10.

发明申请
Primitive operator for similarity joins in data cleaning 有权
标题翻译：数据清理中相似性的原始运算符

公开(公告)号：US20070192342A1

公开(公告)日：2007-08-16

申请号：US11352141

申请日：2006-02-10

申请人： Kaushik Shriraghav , Surajit Chaudhuri , Venkatesh Ganti

发明人： Kaushik Shriraghav , Surajit Chaudhuri , Venkatesh Ganti

IPC分类号： G06F7/00

CPC分类号： G06F17/30442 , Y10S707/99942 , Y10S707/99943

摘要： A set similarity join system and method are provided. The system can be employed to facilitate data cleaning based on similarities through the identification of “close” tuples (e.g., records and/or rows). “Closeness” can be is evaluated using a similarity function(s) chosen to suit the domain and/or application. Thus, the system facilitates generic domain-independent data cleansing. The system can be employed with a foundational primitive, the set similarity join (SSJoin) operator, which can be used as a building block to implement a broad variety of notions of similarity (e.g., edit similarity, Jaccard similarity, generalized edit similarity, hamming distance, soundex, etc.) as well as similarity based on co-occurrences. The SSJoin operator can exploit the observation that set overlap can be used effectively to support a variety of similarity functions. The SSJoin operator compares values based on “sets” associated with (or explicitly constructed for) each one of them.

摘要翻译： 提供了一种集合相似性连接系统和方法。可以通过识别“关闭”元组（例如，记录和/或行）来基于相似性来促进系统的数据清理。可以使用选择适合域和/或应用程序的相似性函数来评估“接近度”。因此，该系统便于通用的域无关数据清理。该系统可以与基本原语，即相似性连接（SSJoin）运算符一起使用，其可以用作构建块来实现各种各样的相似性概念（例如，编辑相似性，Jaccard相似性，广义编辑相似性，汉明距离，声音等）以及基于共同出现的相似性。 SSJoin算子可以利用设置重叠的观察结果有效地用于支持各种相似度函数。 SSJoin操作符根据与其中每一个相关联（或明确构建的）的“集合”来比较值。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类