摘要:
A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.
摘要:
Systems and methods for dynamic product bundling are described herein. For example, embodiments dynamically generate product bundle for customer within a particular segment in view of that customer's interest in a particular product. Embodiments determine customer affinity, customer commonality, and product complementarity and use this information to dynamically generate and optimize product bundles for customers interested in one or more products.
摘要:
An authorisation privilege for an access request is inferred when no explicit privilege exists. The inference can be performed by way of mining occurrence patterns or derived from user hierarchy, profile, click history, transaction history or role. For any access request, the respective explicit privilege or inferred privilege is verified by the database or security administrator before the access request is permitted. Conditions expressed in an access policy are evaluated on the occurrence of predefined events. The events extend beyond user access requests, and include external events, composite events and access of a referential type. The access policy is framed in ‘event, condition, access enforcement’ terminology. The access control rules can be parameterised and can be instantiated by data obtained from inference rules associated with the conditions of the policy. The conditions have an evaluation component and an inference component. The access privileges supported are: read, write and indirect read. An indirect read operation typically allows a user qualified access to one or more portions of a database, but not the entire database.
摘要:
In the context of data administration in enterprises, an effective manner of providing a central data warehouse, particularly via employing a tool that helps by analyzing existing data and reports from different business units. In accordance with at least one embodiment of the invention, such a tool analyzes the data model of an enterprise and proposes alternatives for building a new data warehouse. The tool, in accordance with at least one embodiment of the invention, models the problem of identifying fact/dimension attributes of a warehouse model as a graph cut on a Dependency Analysis Graph (DAG). The DAG is built using existing data models and the report generation scripts. The tool also uses the DAG for generation of ETL (Extract, Transform Load) scripts that can be used to populate the newly proposed data warehouse from data present in the existing schemas.
摘要:
In the context of cloud computing, effective methods and arrangements for storing and tracking provenance. In accordance with at least one embodiment, a distributed file system is advantageously employed to store large amounts of provenance data. File creation involves the creation both of output files and reduce logs.
摘要:
A computer implemented method, computer program product and a data processing system for managing electronic messages is disclosed. The contents of an electronic messages are segmented based on the recipients receiving the message and access control authorizing access to the segmented contents is provided to the segmented contents and transmitted to a list of recipients.
摘要:
Techniques for obtaining a lineage of a schema in one or more documents are provided. The techniques include using a schema to find a document that is most relevant to the schema, obtaining one or more relevant portions of the most relevant document that is related to the schema, constructing a first probe set from the one or more relevant portions of the document, using the first probe set to discover one or more documents for obtaining lineage information, discovering a second probe set from the one or more documents, and recursively using the second probe set to discover a related document.
摘要:
A user can highlight text and provide accompanying annotations. Highlighted text, accompanying annotations, and time-stamp information are stored in a user profile that is maintained locally with a web browser, at the client side. A retrieved web page is presented to a user with annotations of some form, based upon the user profile. The retrieved web page may typically be annotated through marked or highlighted portions of text, so that the user can readily locate this information in the web page, and assess the relevance of the retrieved page.
摘要:
In the context of data administration in enterprises, an effective manner of providing a central data warehouse, particularly via employing a tool that helps by analyzing existing data and reports from different business units. In accordance with at least one embodiment of the invention, such a tool analyzes the data model of an enterprise and proposes alternatives for building a new data warehouse. The tool, in accordance with at least one embodiment of the invention, models the problem of identifying fact/dimension attributes of a warehouse model as a graph cut on a Dependency Analysis Graph (DAG). The DAG is built using existing data models and the report generation scripts. The tool also uses the DAG for generation of ETL (Extract, Transform Load) scripts that can be used to populate the newly proposed data warehouse from data present in the existing schemas.
摘要:
A computer implemented method, computer program product and a data processing system for managing electronic messages is disclosed. The contents of an electronic messages are segmented based on the recipients receiving the message and access control authorizing access to the segmented contents is provided to the segmented contents and transmitted to a list of recipients.