Star and snowflake schemas in extract, transform, load processes

    公开(公告)号:US09298787B2

    公开(公告)日:2016-03-29

    申请号:US13292234

    申请日:2011-11-09

    IPC分类号: G06F17/30

    摘要: A computer-implemented method, computer program product and a system for supporting star and snowflake data schemas for use with an Extract, Transform, Load (ETL) process, comprising selecting a data source comprising dimensional data, where the dimensional data comprises at least one source table comprising at least one source column, importing a data model for the dimensional data into a data integration system, analyzing the imported data model to select a star or snowflake target data schema comprising target dimensions and target facts, generating a meta-model representation by mapping at least one source table or source column to each target fact and target dimension, automatically converting the meta-model representation into one or more ETL jobs, and executing the ETL jobs to extract the dimensional data from the data source and loading the dimensional data into the selected target data schema in a target data system.

    Slowly Changing Dimension Attributes in Extract, Transform, Load Processes
    2.
    发明申请
    Slowly Changing Dimension Attributes in Extract, Transform, Load Processes 审中-公开
    在提取,转换,加载过程中缓慢改变维度属性

    公开(公告)号:US20130124454A1

    公开(公告)日:2013-05-16

    申请号:US13618158

    申请日:2012-09-14

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30563

    摘要: A computer-implemented method, computer program product and a system for identifying and handling slowly changing dimension (SCD) attributes for use with an Extract, Transform, Load (ETL) process, comprising importing a data model for dimensional data into a data integration system, where the dimensional data comprises a plurality of attributes, identifying via a data discovery analyzer one or more attributes in the data model as SCD attributes, importing the identified SCD attributes into the data integration system, selecting a data source comprising dimensional data, automatically generating an ETL job for the dimensional data utilizing the imported SCD attributes, and executing the automatically generated ETL to extract the dimensional data from the data source and loading the dimensional data into the imported SCD attributes in a target data system.

    摘要翻译: 一种计算机实现的方法,计算机程序产品和用于识别和处理与提取,变换,加载(ETL)过程一起使用的缓慢变化的维度(SCD)属性的系统,包括将维数据的数据模型导入数据集成系统 其中尺寸数据包括多个属性,通过数据发现分析器将数据模型中的一个或多个属性识别为SCD属性,将所识别的SCD属性导入到数据集成系统中,选择包括尺寸数据的数据源,自动生成 用于使用导入的SCD属性的维数据的ETL作业,以及执行自动生成的ETL以从数据源提取尺寸数据,并将维数据加载到目标数据系统中的导入的SCD属性中。

    STAR AND SNOWFLAKE SCHEMAS IN EXTRACT, TRANSFORM, LOAD PROCESSES
    3.
    发明申请
    STAR AND SNOWFLAKE SCHEMAS IN EXTRACT, TRANSFORM, LOAD PROCESSES 审中-公开
    提取,变换,加载过程中的STAR和SNOWFLAKE方案

    公开(公告)号:US20130117217A1

    公开(公告)日:2013-05-09

    申请号:US13618282

    申请日:2012-09-14

    IPC分类号: G06F17/30

    摘要: A computer-implemented method, computer program product and a system for supporting star and snowflake data schemas for use with an Extract, Transform, Load (ETL) process, comprising selecting a data source comprising dimensional data, where the dimensional data comprises at least one source table comprising at least one source column, importing a data model for the dimensional data into a data integration system, analyzing the imported data model to select a star or snowflake target data schema comprising target dimensions and target facts, generating a meta-model representation by mapping at least one source table or source column to each target fact and target dimension, automatically converting the meta-model representation into one or more ETL jobs, and executing the ETL jobs to extract the dimensional data from the data source and loading the dimensional data into the selected target data schema in a target data system.

    摘要翻译: 一种计算机实现的方法,计算机程序产品和用于支持用于与提取,变换,加载(ETL)过程一起使用的星形和雪花数据模式的系统,包括选择包括尺寸数据的数据源,其中所述维数据包括至少一个 源表包括至少一个源列,将维数据的数据模型导入数据集成系统,分析导入的数据模型以选择包括目标维度和目标事实的星形或雪花目标数据模式,生成元模型表示 通过将至少一个源表或源列映射到每个目标事实和目标维度,将元模型表示自动转换为一个或多个ETL作业,以及执行ETL作业以从数据源提取尺寸数据并加载维度 数据到目标数据系统中的所选目标数据模式。

    STAR AND SNOWFLAKE SCHEMAS IN EXTRACT, TRANSFORM, LOAD PROCESSES

    公开(公告)号:US20130117216A1

    公开(公告)日:2013-05-09

    申请号:US13292234

    申请日:2011-11-09

    IPC分类号: G06F17/00 G06F7/00

    摘要: A computer-implemented method, computer program product and a system for supporting star and snowflake data schemas for use with an Extract, Transform, Load (ETL) process, comprising selecting a data source comprising dimensional data, where the dimensional data comprises at least one source table comprising at least one source column, importing a data model for the dimensional data into a data integration system, analyzing the imported data model to select a star or snowflake target data schema comprising target dimensions and target facts, generating a meta-model representation by mapping at least one source table or source column to each target fact and target dimension, automatically converting the meta-model representation into one or more ETL jobs, and executing the ETL jobs to extract the dimensional data from the data source and loading the dimensional data into the selected target data schema in a target data system.

    Parallel Processing of ETL Jobs Involving Extensible Markup Language Documents
    5.
    发明申请
    Parallel Processing of ETL Jobs Involving Extensible Markup Language Documents 有权
    涉及可扩展标记语言文档的ETL作业的并行处理

    公开(公告)号:US20110072319A1

    公开(公告)日:2011-03-24

    申请号:US12566255

    申请日:2009-09-24

    IPC分类号: G06F9/46 G06F17/00 G06F11/07

    CPC分类号: G06F11/3604 G06F17/30917

    摘要: Techniques for running an Extract Transform Load (ETL) job in parallel on one or more processors wherein the ETL job comprises use of an extensible markup language (XML) document are provided. The techniques include receiving an XML document input, identifying a node in the XML document at which partitioning of the XML document is to begin, sending partition information to each respective processor, performing a shallow parsing of the XML document in parallel on the one or more processors, wherein each processor performs shallow parsing using the identified partition node until it reaches its identified partition, using the shallow parsing to generate the partition of the input XML document, wherein each processor generates a different partition of the same XML document, and sending each partition in streaming format to an ETL job instance.

    摘要翻译: 提供了在一个或多个处理器上并行运行提取变换加载(ETL)作业的技术,其中ETL作业包括使用可扩展标记语言(XML)文档。 这些技术包括接收XML文档输入,识别XML文档中开始分割XML文档的节点,向每个相应处理器发送分区信息,在一个或多个文件上并行执行XML文档的浅析解 处理器,其中每个处理器使用所识别的分区节点执行浅解析,直到其到达其识别的分区,使用浅解析来生成输入XML文档的分区,其中每个处理器生成相同XML文档的不同分区,并且发送每个 以流格式分区到ETL作业实例。

    Parallel processing of ETL jobs involving extensible markup language documents
    6.
    发明授权
    Parallel processing of ETL jobs involving extensible markup language documents 有权
    并行处理涉及可扩展标记语言文档的ETL作业

    公开(公告)号:US09064047B2

    公开(公告)日:2015-06-23

    申请号:US12566255

    申请日:2009-09-24

    IPC分类号: G06F17/00 G06F11/36 G06F17/30

    CPC分类号: G06F11/3604 G06F17/30917

    摘要: Techniques for running an Extract Transform Load (ETL) job in parallel on one or more processors wherein the ETL job comprises use of an extensible markup language (XML) document are provided. The techniques include receiving an XML document input, identifying a node in the XML document at which partitioning of the XML document is to begin, sending partition information to each respective processor, performing a shallow parsing of the XML document in parallel on the one or more processors, wherein each processor performs shallow parsing using the identified partition node until it reaches its identified partition, using the shallow parsing to generate the partition of the input XML document, wherein each processor generates a different partition of the same XML document, and sending each partition in streaming format to an ETL job instance.

    摘要翻译: 提供了在一个或多个处理器上并行运行提取变换加载(ETL)作业的技术,其中ETL作业包括使用可扩展标记语言(XML)文档。 这些技术包括接收XML文档输入,识别XML文档中开始分割XML文档的节点,向每个相应处理器发送分区信息,在一个或多个文件上并行执行XML文档的浅析解 处理器,其中每个处理器使用所识别的分区节点执行浅解析,直到其到达其识别的分区,使用浅解析来生成输入XML文档的分区,其中每个处理器生成相同XML文档的不同分区,并且发送每个 以流格式分区到ETL作业实例。

    Star and snowflake schemas in extract, transform, load processes
    7.
    发明授权
    Star and snowflake schemas in extract, transform, load processes 有权
    星和雪花图案在提取,转换,加载过程中

    公开(公告)号:US09323815B2

    公开(公告)日:2016-04-26

    申请号:US13618282

    申请日:2012-09-14

    IPC分类号: G06F17/30

    摘要: A computer-implemented method, computer program product and a system for supporting star and snowflake data schemas for use with an Extract, Transform, Load (ETL) process, comprising selecting a data source comprising dimensional data, where the dimensional data comprises at least one source table comprising at least one source column, importing a data model for the dimensional data into a data integration system, analyzing the imported data model to select a star or snowflake target data schema comprising target dimensions and target facts, generating a meta-model representation by mapping at least one source table or source column to each target fact and target dimension, automatically converting the meta-model representation into one or more ETL jobs, and executing the ETL jobs to extract the dimensional data from the data source and loading the dimensional data into the selected target data schema in a target data system.

    摘要翻译: 一种计算机实现的方法,计算机程序产品和用于支持用于与提取,变换,加载(ETL)过程一起使用的星形和雪花数据模式的系统,包括选择包括尺寸数据的数据源,其中所述维数据包括至少一个 源表包括至少一个源列,将维数据的数据模型导入数据集成系统,分析导入的数据模型以选择包括目标维度和目标事实的星形或雪花目标数据模式,生成元模型表示 通过将至少一个源表或源列映射到每个目标事实和目标维度,将元模型表示自动转换为一个或多个ETL作业,以及执行ETL作业以从数据源提取尺寸数据并加载维度 数据到目标数据系统中的所选目标数据模式。

    Slowly changing dimension attributes in extract, transform, load processes

    公开(公告)号:US09311368B2

    公开(公告)日:2016-04-12

    申请号:US13618158

    申请日:2012-09-14

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30563

    摘要: A computer-implemented method, computer program product and a system for identifying and handling slowly changing dimension (SCD) attributes for use with an Extract, Transform, Load (ETL) process, comprising importing a data model for dimensional data into a data integration system, where the dimensional data comprises a plurality of attributes, identifying via a data discovery analyzer one or more attributes in the data model as SCD attributes, importing the identified SCD attributes into the data integration system, selecting a data source comprising dimensional data, automatically generating an ETL job for the dimensional data utilizing the imported SCD attributes, and executing the automatically generated ETL to extract the dimensional data from the data source and loading the dimensional data into the imported SCD attributes in a target data system.

    Speed selective table scan operation
    10.
    发明授权
    Speed selective table scan operation 失效
    速度选择表扫描操作

    公开(公告)号:US07937541B2

    公开(公告)日:2011-05-03

    申请号:US11548889

    申请日:2006-10-12

    IPC分类号: G06F12/06

    摘要: Disclosed are a method, information processing system, and computer readable medium for scanning a storage medium table. The method includes retrieving location information associated with at least one other storage medium table scan. A storage medium table scan is started at a location within a storage medium table based on at least a location of the one other storage medium table scan. A weight is assigned to at least one storage medium block based on at least a current scanning location within the storage medium table relative to the location of the one other table scan. The method determines if a distance between the current scanning location and the location of the one other table scan is greater than a first given threshold. A current scanning operation is delayed, in response to the distance being greater than the given threshold, until the distance is below a second given threshold.

    摘要翻译: 公开了一种用于扫描存储介质表的方法,信息处理系统和计算机可读介质。 该方法包括检索与至少一个其他存储介质表扫描相关联的位置信息。 基于至少另一个存储介质表扫描的位置,在存储介质表中的位置处开始存储介质表扫描。 基于至少一个存储介质表中的当前扫描位置相对于另一个表扫描的位置,将权重分配给至少一个存储介质块。 该方法确定当前扫描位置与另一个表扫描的位置之间的距离是否大于第一给定阈值。 响应于距离大于给定阈值,当前扫描操作被延迟,直到距离低于第二给定阈值。