专利检索 ap:("Tanveer A. Faruquie" OR "Sachindra Joshi" OR "Hima P. Karanam" OR "Marvin Mendelssohn" OR "Mukesh K. Mohania" OR "Angel Smith" OR "L. V. Subramaniam" OR "Girish Venkatachaliah") AND inv:"Tanveer A. Faruquie" 第 1 页

1.

发明申请
Systems and Methods for Discovering Synonymous Elements Using Context Over Multiple Similar Addresses 失效
标题翻译：使用上下文多个相似地址发现同义元素的系统和方法

公开(公告)号：US20110270808A1

公开(公告)日：2011-11-03

申请号：US12771543

申请日：2010-04-30

申请人： Tanveer A. Faruquie , Sachindra Joshi , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , Angel Smith , L. V. Subramaniam , Girish Venkatachaliah

发明人： Tanveer A. Faruquie , Sachindra Joshi , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , Angel Smith , L. V. Subramaniam , Girish Venkatachaliah

IPC分类号： G06F17/30

CPC分类号： G06F17/2735 , G06F17/2795

摘要： A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.

摘要翻译： 提供了基于聚类的数据标准化方法。某些实施例将多个地址作为输入，识别地址的一个或多个特征，基于一个或多个特征聚集地址，利用群集提供用于识别一个或多个同义词的基于数据的上下文对于包含在地址中的元素，并将地址标准化为可接受的格式，其中一个或多个同义词和/或其他元素作为标准化的一部分被添加到或从输入地址中取走处理。

2.

发明授权
Automatic selection of blocking column for de-duplication 失效
标题翻译：自动选择用于重复数据删除的阻止列

公开(公告)号：US08560505B2

公开(公告)日：2013-10-15

申请号：US13313518

申请日：2011-12-07

申请人： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

发明人： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

IPC分类号： G06F7/00

CPC分类号： G06F17/30303

摘要： Blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.

摘要翻译： 阻塞列选择可以包括确定多个列集合的每个列集合的第一参数，其中第一参数指示列集合中的块的分布，以及为每个列集合确定第二参数。第二个参数可以指示列集的块大小。对于每个列集合，可以使用处理器来计算取决于至少第一参数和第二参数的可阻止性的度量。可以根据阻塞性的测量对多个列集进行排序。

3.

发明申请
RESOURCES MANAGEMENT IN DISTRIBUTED COMPUTING ENVIRONMENT 有权
标题翻译：分布式计算环境中的资源管理

公开(公告)号：US20110191781A1

公开(公告)日：2011-08-04

申请号：US12697228

申请日：2010-01-30

申请人： Hima P. Karanam , Tanveer A. Faruquie , L. Venkata Subramaniam , Mukesh K. Mohania , Girish Venkatachaliah

发明人： Hima P. Karanam , Tanveer A. Faruquie , L. Venkata Subramaniam , Mukesh K. Mohania , Girish Venkatachaliah

IPC分类号： G06F9/50

CPC分类号： G06F9/50

摘要： A method, system and a computer program product for determining resources allocation in a distributed computing environment. An embodiment may include identifying resources in a distributed computing environment, computing provisioning parameters, computing configuration parameters and quantifying service parameters in response to a set of service level agreements (SLA). The embodiment may further include iteratively computing a completion time required for completion of the assigned task and a cost. Embodiments may further include computing an optimal resources configuration and computing at least one of an optimal completion time and an optimal cost corresponding to the optimal resources configuration. Embodiments may further include dynamically modifying the optimal resources configuration in response to at least one change in at least one of provisioning parameters, computing parameters and quantifying service parameters.

摘要翻译： 一种用于在分布式计算环境中确定资源分配的方法，系统和计算机程序产品。一个实施例可以包括在分布式计算环境中识别资源，计算供应参数，计算配置参数和响应一组服务水平协议（SLA）量化服务参数。该实施例还可以包括迭代地计算完成分配的任务所需的完成时间和成本。实施例还可以包括计算最佳资源配置并计算与最佳资源配置相对应的最佳完成时间和最优成本中的至少一个。实施例还可以包括响应于供应参数，计算参数和量化服务参数中的至少一个的至少一个变化来动态地修改最佳资源配置。

4.

发明授权
Automatic selection of blocking column for de-duplication 失效
标题翻译：自动选择用于重复数据删除的阻止列

公开(公告)号：US08560506B2

公开(公告)日：2013-10-15

申请号：US13447726

申请日：2012-04-16

申请人： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

发明人： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

IPC分类号： G06F7/00

CPC分类号： G06F17/30303

摘要： A method of blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.

摘要翻译： 阻止列选择的方法可以包括为多个列集合的每个列集合确定第一参数，其中第一参数指示列集合中的块的分布，以及为每个列集合确定第二参数。第二个参数可以指示列集的块大小。对于每个列集合，可以使用处理器来计算取决于至少第一参数和第二参数的可阻止性的度量。可以根据阻塞性的测量对多个列集进行排序。

5.

发明申请
Automatically Mining Patterns for Rule Based Data Standardization Systems 审中-公开
标题翻译：基于规则的数据标准化系统自动挖掘模式

公开(公告)号：US20130238611A1

公开(公告)日：2013-09-12

申请号：US13415144

申请日：2012-03-08

申请人： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

发明人： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

IPC分类号： G06F17/30

CPC分类号： G06F17/30705 , G06F17/2775 , G06F17/30675 , G06F2216/03 , G06Q10/06 , G06Q10/10 , G06Q30/02

摘要： Methods, computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.

摘要翻译： 提供方法，计算机程序产品和系统用于挖掘文本数据集中的子模式。这些实施例有助于找到数据集内的N个经常出现的子模式的集合，从数据集中提取N个子模式，并将所提取的子模式聚类成K个组，其中每个提取的子模式被放置在基于距离值D的与其他提取的子模式相同的组，其确定子模式和同一组内的每个其他子模式之间的相似度。

6.

发明申请
Automatically Mining Patterns For Rule Based Data Standardization Systems 审中-公开
标题翻译：自动挖掘基于规则的数据标准化系统的模式

公开(公告)号：US20130238610A1

公开(公告)日：2013-09-12

申请号：US13414374

申请日：2012-03-07

申请人： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

发明人： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Marvin Mendelssohn , Mukesh K. Mohania , L. Venkata Subramaniam

IPC分类号： G06F17/30

CPC分类号： G06F17/30705 , G06F17/2775 , G06F17/30675 , G06F2216/03 , G06Q10/06 , G06Q10/10 , G06Q30/02 , Y04S10/54

摘要： Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.

摘要翻译： 提供计算机程序产品和系统用于挖掘文本数据集中的子模式。这些实施例有助于找到数据集内的N个经常出现的子模式的集合，从数据集中提取N个子模式，并将所提取的子模式聚类成K个组，其中每个提取的子模式被放置在基于距离值D的与其他提取的子模式相同的组，其确定子模式和同一组内的每个其他子模式之间的相似度。

7.

发明申请
Cleansing a Database System to Improve Data Quality 审中-公开
标题翻译：清理数据库系统以提高数据质量

公开(公告)号：US20120150825A1

公开(公告)日：2012-06-14

申请号：US12966281

申请日：2010-12-13

申请人： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Mukesh K. Mohania , L. Venkata Subramaniam

发明人： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Mukesh K. Mohania , L. Venkata Subramaniam

IPC分类号： G06F17/30

CPC分类号： G06F16/217 , G06F16/215 , G06F16/2462

摘要： According to one embodiment of the present invention, a system controls cleansing of data within a database system, and comprises a computer system including at least one processor. The system receives a data set from the database system, and one or more features of the data set are selected for determining values for one or more characteristics of the selected features. The determined values are applied to a data quality estimation model to determine data quality estimates for the data set. Problematic data within the data set are identified based on the data quality estimates, where the cleansing is adjusted to accommodate the identified problematic data. Embodiments of the present invention further include a method and computer program product for controlling cleansing of data within a database system in substantially the same manner described above.

摘要翻译： 根据本发明的一个实施例，系统控制数据库系统内的数据清理，并且包括包括至少一个处理器的计算机系统。系统从数据库系统接收数据集，并且选择数据集的一个或多个特征以确定所选特征的一个或多个特征的值。将确定的值应用于数据质量估计模型以确定数据集的数据质量估计。基于数据质量估计来识别数据集中的有问题的数据，其中调整清洁以适应所识别的有问题的数据。本发明的实施例还包括一种方法和计算机程序产品，用于以与上述基本相同的方式控制数据库系统内的数据清洗。

8.

发明申请
Cleansing a Database System to Improve Data Quality 审中-公开

公开(公告)号：US20120179658A1

公开(公告)日：2012-07-12

申请号：US13422280

申请日：2012-03-16

申请人： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Mukesh K. Mohania , L. Venkata Subramaniam

发明人： Snigdha Chaturvedi , Tanveer A. Faruquie , Hima P. Karanam , Mukesh K. Mohania , L. Venkata Subramaniam

IPC分类号： G06F7/00

CPC分类号： G06F17/30306 , G06F17/30303 , G06F17/30536

摘要： According to one embodiment of the present invention, a system controls cleansing of data within a database system, and comprises a computer system including at least one processor. The system receives a data set from the database system, and one or more features of the data set are selected for determining values for one or more characteristics of the selected features. The determined values are applied to a data quality estimation model to determine data quality estimates for the data set. Problematic data within the data set are identified based on the data quality estimates, where the cleansing is adjusted to accommodate the identified problematic data. Embodiments of the present invention further include a method and computer program product for controlling cleansing of data within a database system in substantially the same manner described above.

9.

发明授权
Rule set management 失效
标题翻译：规则集管理

公开(公告)号：US08700542B2

公开(公告)日：2014-04-15

申请号：US12969497

申请日：2010-12-15

申请人： Mohan N. Dani , Tanveer A. Faruquie , Hima P. Karanam , L. Venkata Subramaniam , Girish Venkatachaliah

发明人： Mohan N. Dani , Tanveer A. Faruquie , Hima P. Karanam , L. Venkata Subramaniam , Girish Venkatachaliah

IPC分类号： G06F17/30

CPC分类号： G06N5/025

摘要： Systems, methods, and computer products for optimally managing large rule sets are disclosed. Rule dependencies of rules within a set of rules may be determined as a function of rules execution frequency data generated from applying the rules over a data set. The rules within the set of rules may be clustered into rules clusters based on the determined rule dependencies, in which the rules clusters comprise disjoint subsets of the rules within the set of rules. Cluster frequency data for the rules clusters may be used to arrive at an optimal ordering. Each rule within the set of rules may be assigned a unique identification that may capture an execution order of the rules within the set of rules.

摘要翻译： 公开了用于最佳管理大规则集的系统，方法和计算机产品。一组规则中的规则的规则依赖性可以被确定为通过在数据集上应用规则而生成的规则执行频率数据的函数。基于所确定的规则依赖性，该组规则中的规则可以被聚集到规则集群中，其中规则集合包括规则集合内的规则的不相交的子集。可以使用规则集群的群集频率数据来获得最佳排序。该组规则中的每个规则可以被分配唯一的标识，其可以捕获规则集合内的规则的执行顺序。

10.

发明申请
IN-QUERYING DATA CLEANSING WITH SEMANTIC STANDARDIZATION 审中-公开
标题翻译：使用语义标准进行数据清理

公开(公告)号：US20130332408A1

公开(公告)日：2013-12-12

申请号：US13956024

申请日：2013-07-31

申请人： Tanveer A. Faruquie , Mukesh K. Mohania , L. V. Subramaniam , Charles D. Wolfson

发明人： Tanveer A. Faruquie , Mukesh K. Mohania , L. V. Subramaniam , Charles D. Wolfson

IPC分类号： G06F17/30

CPC分类号： G06F16/254 , G06F16/215

摘要： The present invention relates to data cleansing, and in particular performing the semantic standardization process within a database before the transform portion of the extract-transform-load (ETL) process. Provided are a method, system and computer program product for standardizing data within a database engine, configuring the standardization function to determine at least one standardized value for at least one data value by applying the standardization table in a context of at least one data value, receiving a database query identifying the standardization function, at least one database value and the context of the data, and invoking the standardization function.

摘要翻译： 本发明涉及数据清理，特别是在提取 - 转换 - 加载（ETL）处理的变换部分之前，在数据库中执行语义标准化处理。提供了一种用于对数据库引擎内的数据进行标准化的方法，系统和计算机程序产品，通过在至少一个数据值的上下文中应用标准化表来配置标准化功能以确定至少一个数据值的至少一个标准化值，接收识别标准化功能的数据库查询，至少一个数据库值和数据的上下文以及调用标准化功能。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类