Discovering transformations applied to a source table to generate a target table

    公开(公告)号:US09720971B2

    公开(公告)日:2017-08-01

    申请号:US12165549

    申请日:2008-06-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30507

    摘要: Provided are a method, system, and article of manufacture for discovering transformations applied to a source table to generate a target table. Selection is made of a source table comprising a plurality of rows and a target table resulting from a transformation applied to the rows of the source table. A first pre-processing method is applied with respect to columns in the source and target tables to produce first category pre-processing output. The first category pre-processing output is used to determine first category transformation rules with respect to at least one source table column and at least one target table column. For each unpredicted target column in the target table not predicted by the determined first category transformation rules, a second pre-processing method is applied to columns in the source table and unpredicted target columns to produce second category pre-processing output. The second category pre-processing output is used to determine second category transformation rules with respect to at least one source table column and at least one target table column.

    DISCOVERING TRANSFORMATIONS APPLIED TO A SOURCE TABLE TO GENERATE A TARGET TABLE
    2.
    发明申请
    DISCOVERING TRANSFORMATIONS APPLIED TO A SOURCE TABLE TO GENERATE A TARGET TABLE 有权
    发现适用于源表的变换以产生目标表

    公开(公告)号:US20090327208A1

    公开(公告)日:2009-12-31

    申请号:US12165549

    申请日:2008-06-30

    IPC分类号: G06N5/04

    CPC分类号: G06F17/30507

    摘要: Provided are a method, system, and article of manufacture for discovering transformations applied to a source table to generate a target table. Selection is made of a source table comprising a plurality of rows and a target table resulting from a transformation applied to the rows of the source table. A first pre-processing method is applied with respect to columns in the source and target tables to produce first category pre-processing output. The first category pre-processing output is used to determine first category transformation rules with respect to at least one source table column and at least one target table column. For each unpredicted target column in the target table not predicted by the determined first category transformation rules, a second pre-processing method is applied to columns in the source table and unpredicted target columns to produce second category pre-processing output. The second category pre-processing output is used to determine second category transformation rules with respect to at least one source table column and at least one target table column.

    摘要翻译: 提供了用于发现应用于源表以生成目标表的变换的方法,系统和制品。 选择包括多个行的源表和由应用于源表的行的变换产生的目标表。 对源表和目标表中的列应用第一预处理方法以产生第一类预处理输出。 第一类预处理输出用于确定关于至少一个源表列和至少一个目标表列的第一类转换规则。 对于目标表中未被确定的第一类别变换规则预测的每个未预测的目标列,将第二预处理方法应用于源表中的列和不可预测的目标列以产生第二类预处理输出。 第二类预处理输出用于确定关于至少一个源表列和至少一个目标表列的第二类转换规则。

    Using data mining algorithms including association rules and tree classifications to discover data rules
    3.
    发明授权
    Using data mining algorithms including association rules and tree classifications to discover data rules 有权
    使用包括关联规则和树分类的数据挖掘算法来发现数据规则

    公开(公告)号:US07836004B2

    公开(公告)日:2010-11-16

    申请号:US11609307

    申请日:2006-12-11

    IPC分类号: G06N5/02

    CPC分类号: G06F17/30303

    摘要: Provided are a method, system, and article of manufacture for using a data mining algorithm to discover data rules. A data set including multiple records is processed to generate data rules for the data set. Each record has a record format including a plurality of fields and each rule provides a predicted condition for one field based on at least one predictor condition in at least one other field. The generated data rules are provided to a user interface to enable a user to edit the generated data rules. The data rules are stored in a rule repository to be available to use to validate data sets having the record format.

    摘要翻译: 提供了使用数据挖掘算法来发现数据规则的方法,系统和制品。 处理包括多个记录的数据集以生成数据集的数据规则。 每个记录具有包括多个字段的记录格式,并且每个规则基于至少一个其他字段中的至少一个预测变量条件提供一个字段的预测条件。 生成的数据规则被提供给用户界面,以使用户能够编辑所生成的数据规则。 数据规则存储在可用于验证具有记录格式的数据集的规则库中。

    Using a data mining algorithm to generate rules used to validate a selected region of a predicted column
    4.
    发明授权
    Using a data mining algorithm to generate rules used to validate a selected region of a predicted column 失效
    使用数据挖掘算法生成用于验证预测列的选定区域的规则

    公开(公告)号:US08171001B2

    公开(公告)日:2012-05-01

    申请号:US11769634

    申请日:2007-06-27

    IPC分类号: G06F17/00

    CPC分类号: G06N5/022 G06F17/30303

    摘要: Provided are an article of manufacture, system, and method for using a data mining algorithm to generate rules used to validate a selected region of a predicted column. A data set has a plurality of columns and records providing data for each of the columns. Selection is received of at least one predicted column for which rules are to be generated and at least one region of the selected at least one predicted column, wherein each region specifies data positions in the column. The data set is processed to determine association relationships among data in at least one predictor column and subsequences in the selected at least one region of the at least one predicted column. At least one rule is generated from the relationships specifying a condition involving at least one predictor column that predicts at least one value in the selected region of the at least one predicted column.

    摘要翻译: 提供了使用数据挖掘算法来生成用于验证预测列的选定区域的规则的制造,系统和方法。 数据集具有多个列和用于为每个列提供数据的记录。 接收到要为其生成规则的至少一个预测列和所选择的至少一个预测列的至少一个区域的选择,其中每个区域指定列中的数据位置。 处理数据集以确定至少一个预测列中的数据与所选择的至少一个预测列的所选择的至少一个区域中的子序列之间的关联关系。 从关系中产生至少一个规则,所述关系规定了涉及至少一个预测列的条件,所述预测器列预测所述至少一个预测列的所选区域中的至少一个值。

    Using a data mining algorithm to generate format rules used to validate data sets
    5.
    发明授权
    Using a data mining algorithm to generate format rules used to validate data sets 失效
    使用数据挖掘算法生成用于验证数据集的格式规则

    公开(公告)号:US08166000B2

    公开(公告)日:2012-04-24

    申请号:US11769639

    申请日:2007-06-27

    IPC分类号: G06F17/00

    CPC分类号: G06N5/025 G06F17/30303

    摘要: Provided are a method, system, and article of manufacture for using a data mining algorithm to generate format rules used to validate data sets. A data set has a plurality of columns and records providing data for each of the columns. Selection is received of at least one format column for which format rules are to be generated and selection is received of at least one predictor column. A format mask column is generated for each selected format column. For records in the data set, a value in the at least one format column is converted to a format mask representing a format of the value in the format column and storing the format mask in the format mask column in the record for which the format mask was generated. The at least one predictor column and the at least one format mask column are processed to generate at least one format rule. Each format rule specifies a format mask associated with at least one condition in the at least one predictor column.

    摘要翻译: 提供了使用数据挖掘算法来生成用于验证数据集的格式规则的方法,系统和制品。 数据集具有多个列和用于为每个列提供数据的记录。 接收至少一个要生成格式规则的格式列的选择,并且接收至少一个预测变量列的选择。 为每个选定的格式列生成格式掩码列。 对于数据集中的记录,将至少一个格式列中的值转换为表示格式列中值的格式的格式掩码,并将格式掩码存储在格式掩码的记录中的格式掩码列中 被生成。 至少一个预测器列和至少一个格式掩码列被处理以产生至少一个格式规则。 每个格式规则指定与至少一个预测器列中的至少一个条件相关联的格式掩码。

    USING A DATA MINING ALGORITHM TO GENERATE FORMAT RULES USED TO VALIDATE DATA SETS
    6.
    发明申请
    USING A DATA MINING ALGORITHM TO GENERATE FORMAT RULES USED TO VALIDATE DATA SETS 失效
    使用数据挖掘算法生成用于确定数据集的格式规则

    公开(公告)号:US20090006283A1

    公开(公告)日:2009-01-01

    申请号:US11769639

    申请日:2007-06-27

    IPC分类号: G06F15/18

    CPC分类号: G06N5/025 G06F17/30303

    摘要: Provided are a method, system, and article of manufacture for using a data mining algorithm to generate format rules used to validate data sets. A data set has a plurality of columns and records providing data for each of the columns. Selection is received of at least one format column for which format rules are to be generated and selection is received of at least one predictor column. A format mask column is generated for each selected format column. For records in the data set, a value in the at least one format column is converted to a format mask representing a format of the value in the format column and storing the format mask in the format mask column in the record for which the format mask was generated. The at least one predictor column and the at least one format mask column are processed to generate at least one format rule. Each format rule specifies a format mask associated with at least one condition in the at least one predictor column.

    摘要翻译: 提供了使用数据挖掘算法来生成用于验证数据集的格式规则的方法,系统和制品。 数据集具有多个列和用于为每个列提供数据的记录。 接收至少一个要生成格式规则的格式列的选择,并且接收至少一个预测变量列的选择。 为每个选定的格式列生成格式掩码列。 对于数据集中的记录,将至少一个格式列中的值转换为表示格式列中值的格式的格式掩码,并将格式掩码存储在格式掩码的记录中的格式掩码列中 被生成。 处理至少一个预测器列和至少一个格式掩码列以生成至少一个格式规则。 每个格式规则指定与至少一个预测器列中的至少一个条件相关联的格式掩码。

    Managing validation models and rules to apply to data sets
    7.
    发明授权
    Managing validation models and rules to apply to data sets 有权
    管理验证模型和规则以应用于数据集

    公开(公告)号:US08401987B2

    公开(公告)日:2013-03-19

    申请号:US11779251

    申请日:2007-07-17

    IPC分类号: G06N5/00

    摘要: Provided are a method, system, and article of manufacture for managing validation models and rules to apply to data sets. A schema definition describing a structure of at least one column in a first data set having a plurality of columns and records providing data for each of the columns is received. At least one model is generated, wherein each model asserts conditions for at least one column in a record of the first data set. The schema definition and the at least one model are stored in a data quality model. Selection is received of a second data set and the data quality model. A determination is made as to whether a structure of the second data set is compatible with the schema definition in the selected data quality model. Each model in the data quality model is applied to the records in the second data set to validate the records in the second data set in response to determining that the structure of the second data set and the schema definition are compatible.

    摘要翻译: 提供了一种用于管理验证模型和规则以应用于数据集的方法,系统和制品。 接收描述具有多个列的第一数据集中的至少一列的结构的模式定义和为每个列提供数据的记录。 生成至少一个模型,其中每个模型为第一数据集的记录中的至少一列确定条件。 模式定义和至少一个模型存储在数据质量模型中。 接收到第二数据集和数据质量模型的选择。 确定第二数据集的结构是否与所选数据质量模型中的模式定义兼容。 数据质量模型中的每个模型被应用于第二数据集中的记录,以响应于确定第二数据集的结构和模式定义是兼容的来验证第二数据集中的记录。

    Common interface to access catalog information from heterogeneous databases
    10.
    发明授权
    Common interface to access catalog information from heterogeneous databases 有权
    从异构数据库访问目录信息的公共接口

    公开(公告)号:US08051094B2

    公开(公告)日:2011-11-01

    申请号:US12249957

    申请日:2008-10-12

    IPC分类号: G06F7/00 G06F17/30

    摘要: Various embodiments of a system and computer program product to access metadata from a plurality of data servers from a federated database management system are provided. In one embodiment, a request for metadata, from a client application, is received by the federated database management system. Data servers which are accessible from the federated database management system are identified. For each data server, metadata describing data of a data source of that data server is retrieved in accordance with the application request. The retrieved metadata from each of the data servers is aggregated to produce an aggregated result in a uniform format. The aggregated result is provided. In another embodiment, for each data server, a source metadata request for metadata of that data server is generated in accordance with the application request and a source metadata application programming interface. A view is created based on the source metadata request for metadata for each data server.

    摘要翻译: 提供了从联合数据库管理系统访问来自多个数据服务器的元数据的系统和计算机程序产品的各种实施例。 在一个实施例中,来自客户端应用的元数据请求由联合数据库管理系统接收。 识别从联合数据库管理系统可访问的数据服务器。 对于每个数据服务器,根据应用请求检索描述该数据服务器的数据源的数据的元数据。 从每个数据服务器检索的元数据被聚合以产生统一格式的聚合结果。 提供汇总结果。 在另一个实施例中,对于每个数据服务器,根据应用请求和源元数据应用编程接口来生成针对该数据服务器的元数据的源元数据请求。 基于每个数据服务器的元数据的源元数据请求创建视图。