摘要:
The present invention relates to a method, computer program product and system for masking sensitive data and, more particularly, to dynamically de-identifying sensitive data from a data source for a target application, including enabling a user to selectively alter an initial de-identification protocol for the sensitive data elements via an interface.
摘要:
Methods and arrangements for processing hierarchical data in a map-reduce framework. Hierarchical data is accepted, and a map-reduce job is performed on the hierarchical data. This performing of a map-reduce job includes determining a cost of partitioning the data, determining a cost of redefining the job and thereupon selectively performing at least one step taken from the group consisting of: partitioning the data and redefining the job.
摘要:
Methods and arrangements for extracting tuples from a streaming XML document. A query twig is applied to the XML document stream, tuples are extracted from the XML document stream based on the query twig, and a quantity of extracted tuples is limited via foregoing extraction of duplicate tuples extraction of tuples that do not satisfy query twig criteria.
摘要:
Disclosed is a data processing system implemented method, a data processing system and an article of manufacture for improving execution efficiency of a database workload to be executed against a database. The database includes database tables, and the database workload identifies at least one of the database tables. The data processing system includes an identification module for identifying candidate database tables being identifiable in the database workload, the identified candidate database tables being eligible for organization under a clustering schema, a selection module for selecting the identified candidate tables according to whether execution of the database workload is improved if the selected identified candidate table is organized according to the clustering scheme, and an organization module for organizing the clustering schema of the selected organized identified candidate tables prior to the database workload being execution against the database.
摘要:
A system and method of converting a recursive XML document into a relational schema comprises providing a recursive XML document; parsing an external mapping script specifying a mapping from the recursive XML document to a relational table format; building a recursive shredding tree based on the external mapping script and the relational table format; and shredding the mapped recursive XML document into a relational table. The system and method further comprise detecting whether any of a XML schema and a DTD document is recursive, wherein the detecting comprises building a directed graph comprising element names; corresponding elements names as nodes in the directed graph; forming arcs from every element parent node to every element child node of the element parent node; and checking for cycles in the directed graph. The system and method further comprise identifying all recursive cursor nodes and a recursive degree corresponding to the recursive shredding tree.
摘要:
A repository index records the position of document entries relative to landmark entries within the document. Landmark entries are selecting using a landmarking policy and their position relative to the document are stored in a landmark directory. During index updates, an edit transcript is generated describing the difference between old and new document versions, and both the document repository index and the landmark directory are updated as needed. Thus, the number of update operations preformed as compared with conventional indexing techniques may be substantially reduced when small, localized changes are made to the document. This is due to fact that the positions of document entries are recorded relative to the landmark entries rather than the document itself. By doing so, the document index becomes more shift-invariant, requiring fewer update operations when entries are added or inserted in localized areas of the document.
摘要:
The present invention relates to a method, computer program product and system for de-identifying data, wherein a de-identification protocol is selectively mapped to a business rule at runtime via an ETL tool.
摘要:
A method, computer program product, and system for enabling parallel processing of an XML document without pre-parsing, utilizing metadata associated with the XML document and created at the same time as the XML document. The metadata is used to generate partitions of the XML document at the time of parallel processing, without requiring system-intensive pre-parsing.
摘要:
The present invention relates to a method, computer program product and system for masking sensitive data and, more particularly, to dynamically de-identifying sensitive data from a data source for a target application, including enabling a user to selectively alter an initial de-identification protocol for the sensitive data elements via an interface.
摘要:
The present invention relates to a method, computer program product and system for de-identifying data, wherein a de-identification protocol is selectively mapped to a business rule at runtime via an ETL tool.