摘要:
A method, computer program product, and system for enabling parallel processing of an XML document without pre-parsing, utilizing metadata associated with the XML document and created at the same time as the XML document. The metadata is used to generate partitions of the XML document at the time of parallel processing, without requiring system-intensive pre-parsing.
摘要:
The present invention relates to a method, computer program product and system for masking sensitive data and, more particularly, to dynamically de-identifying sensitive data from a data source for a target application, including enabling a user to selectively alter an initial de-identification protocol for the sensitive data elements via an interface.
摘要:
The present invention relates to a method, computer program product and system for de-identifying data, wherein a de-identification protocol is selectively mapped to a business rule at runtime via an ETL tool.
摘要:
A method, a computer program product and a system identify partition locations within an extended markup language (XML) document without parsing so as to process portions of said document in parallel. The XML document includes sections required to remain continuous. The document is scanned for continuous sections without parsing, and boundaries of the initial partitions are adjusted to reside outside the continuous sections to determine resulting partitions for the document. The resulting partitions may be processed in parallel to provide the document information for storage.
摘要:
A method, computer program product, and system for enabling parallel processing of an XML document without pre-parsing, utilizing metadata associated with the XML document and created at the same time as the XML document. The metadata is used to generate partitions of the XML document at the time of parallel processing, without requiring system-intensive pre-parsing.
摘要:
Methods and arrangements for extracting tuples from a streaming XML document. A query twig is applied to the XML document stream, tuples are extracted from the XML document stream based on the query twig, and a quantity of extracted tuples is limited via foregoing extraction of duplicate tuples extraction of tuples that do not satisfy query twig criteria.
摘要:
A computer implemented method, apparatus, and computer usable program code for generating an execution plan graph from a data flow. A metadata representation of the data flow is generated in response to receiving the data flow. A set of code units is generated from the metadata representation. Each code unit in the set of code units is executable on multiple different types of runtime engines. The set of code units is processed to produce the execution plan graph.
摘要:
Methods and systems for implementing a splitter operation in an extract, transform, and load (ETL) process are provided. In one implementation, the method includes receiving a data flow including a splitter operation, and generating an execution plan graph based on the data flow. The execution plan graph includes structured query language (SQL) code for implementing the splitter operation, in which the structured query language (SQL) code is respectively executable among database servers associated with different vendors.
摘要:
Systems and associated methods for address standardization and applications related thereto are described. Embodiments exploit a common context in a taxonomy and a given address to detect and correct deviations in the address. Embodiments establish a possible path from a root of the taxonomy to a leaf in the taxonomy that can possibly generate a given address. Given a new address, embodiments use complete addresses, and/or segments or elements thereof, to compute the representations of the elements and find a closest matching leaf in the taxonomy. Embodiments then traverse the path to a root node to detect the agreement and disagreement between the path and the address entry. Taxonomical structured is thus used to detect, segregate and standardize the expected fields.
摘要:
The present invention relates to a method, computer program product and system for de-identifying data, wherein a de-identification protocol is selectively mapped to a business rule at runtime via an ETL tool.