RESOURCES MANAGEMENT IN DISTRIBUTED COMPUTING ENVIRONMENT
    3.
    发明申请
    RESOURCES MANAGEMENT IN DISTRIBUTED COMPUTING ENVIRONMENT 有权
    分布式计算环境中的资源管理

    公开(公告)号:US20110191781A1

    公开(公告)日:2011-08-04

    申请号:US12697228

    申请日:2010-01-30

    IPC分类号: G06F9/50

    CPC分类号: G06F9/50

    摘要: A method, system and a computer program product for determining resources allocation in a distributed computing environment. An embodiment may include identifying resources in a distributed computing environment, computing provisioning parameters, computing configuration parameters and quantifying service parameters in response to a set of service level agreements (SLA). The embodiment may further include iteratively computing a completion time required for completion of the assigned task and a cost. Embodiments may further include computing an optimal resources configuration and computing at least one of an optimal completion time and an optimal cost corresponding to the optimal resources configuration. Embodiments may further include dynamically modifying the optimal resources configuration in response to at least one change in at least one of provisioning parameters, computing parameters and quantifying service parameters.

    摘要翻译: 一种用于在分布式计算环境中确定资源分配的方法,系统和计算机程序产品。 一个实施例可以包括在分布式计算环境中识别资源,计算供应参数,计算配置参数和响应一组服务水平协议(SLA)量化服务参数。 该实施例还可以包括迭代地计算完成分配的任务所需的完成时间和成本。 实施例还可以包括计算最佳资源配置并计算与最佳资源配置相对应的最佳完成时间和最优成本中的至少一个。 实施例还可以包括响应于供应参数,计算参数和量化服务参数中的至少一个的至少一个变化来动态地修改最佳资源配置。

    Automatic selection of blocking column for de-duplication
    4.
    发明授权
    Automatic selection of blocking column for de-duplication 失效
    自动选择用于重复数据删除的阻止列

    公开(公告)号:US08560506B2

    公开(公告)日:2013-10-15

    申请号:US13447726

    申请日:2012-04-16

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30303

    摘要: A method of blocking column selection can include determining a first parameter for each column set of a plurality of column sets, wherein the first parameter indicates distribution of blocks in the column set, and determining a second parameter for each column set. The second parameter can indicate block size for the column set. For each column set, a measure of blockability that is dependent upon at least the first parameter and the second parameter can be calculated using a processor. The plurality of column sets can be ranked according to the measures of blockability.

    摘要翻译: 阻止列选择的方法可以包括为多个列集合的每个列集合确定第一参数,其中第一参数指示列集合中的块的分布,以及为每个列集合确定第二参数。 第二个参数可以指示列集的块大小。 对于每个列集合,可以使用处理器来计算取决于至少第一参数和第二参数的可阻止性的度量。 可以根据阻塞性的测量对多个列集进行排序。

    Cleansing a Database System to Improve Data Quality
    7.
    发明申请
    Cleansing a Database System to Improve Data Quality 审中-公开
    清理数据库系统以提高数据质量

    公开(公告)号:US20120150825A1

    公开(公告)日:2012-06-14

    申请号:US12966281

    申请日:2010-12-13

    IPC分类号: G06F17/30

    摘要: According to one embodiment of the present invention, a system controls cleansing of data within a database system, and comprises a computer system including at least one processor. The system receives a data set from the database system, and one or more features of the data set are selected for determining values for one or more characteristics of the selected features. The determined values are applied to a data quality estimation model to determine data quality estimates for the data set. Problematic data within the data set are identified based on the data quality estimates, where the cleansing is adjusted to accommodate the identified problematic data. Embodiments of the present invention further include a method and computer program product for controlling cleansing of data within a database system in substantially the same manner described above.

    摘要翻译: 根据本发明的一个实施例,系统控制数据库系统内的数据清理,并且包括包括至少一个处理器的计算机系统。 系统从数据库系统接收数据集,并且选择数据集的一个或多个特征以确定所选特征的一个或多个特征的值。 将确定的值应用于数据质量估计模型以确定数据集的数据质量估计。 基于数据质量估计来识别数据集中的有问题的数据,其中调整清洁以适应所识别的有问题的数据。 本发明的实施例还包括一种方法和计算机程序产品,用于以与上述基本相同的方式控制数据库系统内的数据清洗。

    Rule set management
    9.
    发明授权
    Rule set management 失效
    规则集管理

    公开(公告)号:US08700542B2

    公开(公告)日:2014-04-15

    申请号:US12969497

    申请日:2010-12-15

    IPC分类号: G06F17/30

    CPC分类号: G06N5/025

    摘要: Systems, methods, and computer products for optimally managing large rule sets are disclosed. Rule dependencies of rules within a set of rules may be determined as a function of rules execution frequency data generated from applying the rules over a data set. The rules within the set of rules may be clustered into rules clusters based on the determined rule dependencies, in which the rules clusters comprise disjoint subsets of the rules within the set of rules. Cluster frequency data for the rules clusters may be used to arrive at an optimal ordering. Each rule within the set of rules may be assigned a unique identification that may capture an execution order of the rules within the set of rules.

    摘要翻译: 公开了用于最佳管理大规则集的系统,方法和计算机产品。 一组规则中的规则的规则依赖性可以被确定为通过在数据集上应用规则而生成的规则执行频率数据的函数。 基于所确定的规则依赖性,该组规则中的规则可以被聚集到规则集群中,其中规则集合包括规则集合内的规则的不相交的子集。 可以使用规则集群的群集频率数据来获得最佳排序。 该组规则中的每个规则可以被分配唯一的标识,其可以捕获规则集合内的规则的执行顺序。

    IN-QUERYING DATA CLEANSING WITH SEMANTIC STANDARDIZATION
    10.
    发明申请
    IN-QUERYING DATA CLEANSING WITH SEMANTIC STANDARDIZATION 审中-公开
    使用语义标准进行数据清理

    公开(公告)号:US20130332408A1

    公开(公告)日:2013-12-12

    申请号:US13956024

    申请日:2013-07-31

    IPC分类号: G06F17/30

    CPC分类号: G06F16/254 G06F16/215

    摘要: The present invention relates to data cleansing, and in particular performing the semantic standardization process within a database before the transform portion of the extract-transform-load (ETL) process. Provided are a method, system and computer program product for standardizing data within a database engine, configuring the standardization function to determine at least one standardized value for at least one data value by applying the standardization table in a context of at least one data value, receiving a database query identifying the standardization function, at least one database value and the context of the data, and invoking the standardization function.

    摘要翻译: 本发明涉及数据清理,特别是在提取 - 转换 - 加载(ETL)处理的变换部分之前,在数据库中执行语义标准化处理。 提供了一种用于对数据库引擎内的数据进行标准化的方法,系统和计算机程序产品,通过在至少一个数据值的上下文中应用标准化表来配置标准化功能以确定至少一个数据值的至少一个标准化值, 接收识别标准化功能的数据库查询,至少一个数据库值和数据的上下文以及调用标准化功能。