发明授权
US09064047B2 Parallel processing of ETL jobs involving extensible markup language documents
有权
并行处理涉及可扩展标记语言文档的ETL作业
- 专利标题: Parallel processing of ETL jobs involving extensible markup language documents
- 专利标题(中): 并行处理涉及可扩展标记语言文档的ETL作业
-
申请号: US12566255申请日: 2009-09-24
-
公开(公告)号: US09064047B2公开(公告)日: 2015-06-23
- 发明人: Manoj K. Agarwal , Manish A. Bhide , Srilakshmi Kotwal , Srinivas Kiran Mittapalli , Sriram Padmanabhan
- 申请人: Manoj K. Agarwal , Manish A. Bhide , Srilakshmi Kotwal , Srinivas Kiran Mittapalli , Sriram Padmanabhan
- 申请人地址: US NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: US NY Armonk
- 代理机构: Ryan, Mason & Lewis, LLP
- 主分类号: G06F17/00
- IPC分类号: G06F17/00 ; G06F11/36 ; G06F17/30
摘要:
Techniques for running an Extract Transform Load (ETL) job in parallel on one or more processors wherein the ETL job comprises use of an extensible markup language (XML) document are provided. The techniques include receiving an XML document input, identifying a node in the XML document at which partitioning of the XML document is to begin, sending partition information to each respective processor, performing a shallow parsing of the XML document in parallel on the one or more processors, wherein each processor performs shallow parsing using the identified partition node until it reaches its identified partition, using the shallow parsing to generate the partition of the input XML document, wherein each processor generates a different partition of the same XML document, and sending each partition in streaming format to an ETL job instance.
公开/授权文献
信息查询