发明申请
US20110072319A1 Parallel Processing of ETL Jobs Involving Extensible Markup Language Documents
有权
涉及可扩展标记语言文档的ETL作业的并行处理
- 专利标题: Parallel Processing of ETL Jobs Involving Extensible Markup Language Documents
- 专利标题(中): 涉及可扩展标记语言文档的ETL作业的并行处理
-
申请号: US12566255申请日: 2009-09-24
-
公开(公告)号: US20110072319A1公开(公告)日: 2011-03-24
- 发明人: Manoj K. Agarwal , Manish A. Bhide , Srilakshmi Kotwal , Srinivas Kiran Mittapalli , Sriram Padmanabhan
- 申请人: Manoj K. Agarwal , Manish A. Bhide , Srilakshmi Kotwal , Srinivas Kiran Mittapalli , Sriram Padmanabhan
- 申请人地址: US NY Armonk
- 专利权人: INTERNATIONAL BUSINESS MACHINES CORPORATION
- 当前专利权人: INTERNATIONAL BUSINESS MACHINES CORPORATION
- 当前专利权人地址: US NY Armonk
- 主分类号: G06F9/46
- IPC分类号: G06F9/46 ; G06F17/00 ; G06F11/07
摘要:
Techniques for running an Extract Transform Load (ETL) job in parallel on one or more processors wherein the ETL job comprises use of an extensible markup language (XML) document are provided. The techniques include receiving an XML document input, identifying a node in the XML document at which partitioning of the XML document is to begin, sending partition information to each respective processor, performing a shallow parsing of the XML document in parallel on the one or more processors, wherein each processor performs shallow parsing using the identified partition node until it reaches its identified partition, using the shallow parsing to generate the partition of the input XML document, wherein each processor generates a different partition of the same XML document, and sending each partition in streaming format to an ETL job instance.
公开/授权文献
信息查询