摘要:
A computer-implemented method for detecting a set of inconsistent data records in a database including multiple records, comprises selecting a data quality rule representing a functional dependency for the database, transforming the data quality rule into at least one rule vector with hashed components, selecting a set of attributes of the database, transforming at least one record of the database selected on the basis of the selected attributes into a record vector with hashed components, computing a dot product of the rule and record vectors to generate a measure representing violation of the data quality rule by the record.
摘要:
A computer-implemented method for detecting a set of inconsistent data records in a database including multiple records, comprises selecting a data quality rule representing a functional dependency for the database, transforming the data quality rule into at least one rule vector with hashed components, selecting a set of attributes of the database, transforming at least one record of the database selected on the basis of the selected attributes into a record vector with hashed components, computing a dot product of the rule and record vectors to generate a measure representing violation of the data quality rule by the record.
摘要:
A system for allowing consistent execution of a workflow process in a computer-enabled workflow management system is described. The system includes a workflow process database accessible by the workflow process. The workflow process includes at least one sequence of workflow actions, having at least one set of parallel workflow actions and being configured as a plurality of nodes interconnected by arcs. Each node defines at least one of the workflow actions and reading and writing data items when executing the workflow actions. A first module is provided to lock all data items in the workflow process database that are specified for access by the workflow process from being accessed by other workflow processes during execution of the workflow process before the execution of the workflow process is commenced. A second module is provided to release all the locked data items from being locked after the workflow process has been executed such that execution consistency and concurrency of the workflow process is maintained. A computer implemented method for allowing consistent execution of a workflow process in a computer-enabled workflow management system is also described.
摘要:
A computer-implemented method comprising partitioning data representing an input instance of a database including multiple tuples into multiple fragments of tuples, detecting tuples which violate a data quality specification in respective ones of the fragments, selecting a data cleaning asset on the basis of characteristics of errors in detected tuples for a fragment and based on declared asset capabilities, assigning a selected data cleaning asset to the fragment, the selected data cleaning asset to provide a set of candidate corrections for the detected tuples in the fragment, providing data representing an output instance of the database in which detected tuples are replaced with selected candidate corrections.
摘要:
A computer-implemented method comprising partitioning data representing an input instance of a database including multiple tuples into multiple fragments of tuples, detecting tuples which violate a data quality specification in respective ones of the fragments, selecting a data cleaning asset on the basis of characteristics of errors in detected tuples for a fragment and based on declared asset capabilities, assigning a selected data cleaning asset to the fragment, the selected data cleaning asset to provide a set of candidate corrections for the detected tuples in the fragment, providing data representing an output instance of the database in which detected tuples are replaced with selected candidate corrections.