摘要:
Methods and arrangements for extracting tuples from a streaming XML document. A query twig is applied to the XML document stream, tuples are extracted from the XML document stream based on the query twig, and a quantity of extracted tuples is limited via foregoing extraction of duplicate tuples extraction of tuples that do not satisfy query twig criteria.
摘要:
Embodiments of the invention disclose a method, a system and a computer program product of discovering automated insights in XML data by generating a query result in response to querying data using a query, wherein the data is in a markup language format, and identifying a pattern associated with the query result, wherein the data in the markup language format is used for pattern identification.
摘要:
Embodiments of the invention broadly contemplate systems, apparatuses and methods providing simplified file searching using peripheral external information derived from one or more external sources. The peripheral external information corresponds to the context in which the user received and/or saved the file.
摘要:
The present invention relates to data cleansing, and in particular performing the semantic standardization process within a database before the transform portion of the extract-transform-load (ETL) process. Provided are a method, system and computer program product for standardizing data within a database engine, configuring the standardization function to determine at least one standardized value for at least one data value by applying the standardization table in a context of at least one data value, receiving a database query identifying the standardization function, at least one database value and the context of the data, and invoking the standardization function.
摘要:
Computer software is disclosed for discovering and representing a reporting model of an existing reporting environment. For each report in a plurality of reports, the software searches metadata of the report for descriptive information and dependencies on other reports. The software depicts, in a graphical representation, each report and relationships between the reports.
摘要:
A system for receiving a declarative specification including a plurality of stages. Each stage specifies an atomic operation, a data input to the atomic operation, and a data output from the atomic operation. The data input is characterized by a data type. Links between at least two of the stages are generated to create a data integration workflow. The data integration workflow is compiled to generate computer code for execution on a parallel processing platform. The computer code configured to perform at least one of data preparation and data analysis.
摘要:
The present invention relates to data cleansing, and in particular performing the semantic standardization process within a database before the transform portion of the extract-transform-load (ETL) process. Provided are a method, system and computer program product for standardizing data within a database engine, configuring the standardization function to determine at least one standardized value for at least one data value by applying the standardization table in a context of at least one data value, receiving a database query identifying the standardization function, at least one database value and the context of the data, and invoking the standardization function.
摘要:
Embodiments of the invention disclose a method, a system and a computer program product of discovering automated insights in XML data by generating a query result in response to querying data using a query, wherein the data is in a markup language format, and identifying a pattern associated with the query result, wherein the data in the markup language format is used for pattern identification.
摘要:
Embodiments of the invention disclose a method, a system and a computer program product of discovering automated insights in XML data by generating a query result in response to querying data using a query, wherein the data is in a markup language format, and identifying a pattern associated with the query result, wherein the data in the markup language format is used for pattern identification.
摘要:
Systems and associated methods providing a document corpus navigation interface including domain independent facets are described. Embodiments provide a list of domain independent facets, extract facet values from the document corpus, learn the facet values from the corpus, map each document to one of the values of each of the facets, and automatically determine a weight of the relationship.