摘要:
Systems and associated methods providing a document corpus navigation interface including domain independent facets are described. Embodiments provide a list of domain independent facets, extract facet values from the document corpus, learn the facet values from the corpus, map each document to one of the values of each of the facets, and automatically determine a weight of the relationship.
摘要:
A method, system and a computer program product for determining resources allocation in a distributed computing environment. An embodiment may include identifying resources in a distributed computing environment, computing provisioning parameters, computing configuration parameters and quantifying service parameters in response to a set of service level agreements (SLA). The embodiment may further include iteratively computing a completion time required for completion of the assigned task and a cost. Embodiments may further include computing an optimal resources configuration and computing at least one of an optimal completion time and an optimal cost corresponding to the optimal resources configuration. Embodiments may further include dynamically modifying the optimal resources configuration in response to at least one change in at least one of provisioning parameters, computing parameters and quantifying service parameters.
摘要:
Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.
摘要:
During application of data quality rules to a data set obtained from a data source, data is retrieved from the data source along with a common set of rules configured to format the retrieved data in a manner in accordance with one or more predefined data quality rules of the common set of rules. At least one predefined data quality rule is adjusted utilizing at least one editable widget to form a modified set of data quality rules adapted for use with a specified application. The modified set of data quality rules is applied to the retrieved data.
摘要:
Systems and methods are provided for file searching on mobile devices. A system includes a user interface and a file query system. The user interface is for receiving a user-provided spatio-temporal query for use in searching for a particular file. The user-provided spatio-temporal query is provided by a user of a mobile device. The file query system is for determining information about the particular file responsive to the user-provided spatio-temporal query, and identifying from the information one or more files as a search result for the particular file.
摘要:
A computer receives an electronic document that includes a group of terms. The computer sends the electronic document to an information extraction program that extracts specific terms from the group of terms. Each of the specific terms that match to a certain extent with one of the attribute values in an electronic dictionary is identified. A value associated with the electronic document is generated based on the specific terms that match, and on an end-user that is attempting to access the electronic document.
摘要:
The present invention relates to data cleansing, and in particular performing the semantic standardization process within a database before the transform portion of the extract-transform-load (ETL) process. Provided are a method, system and computer program product for standardizing data within a database engine, configuring the standardization function to determine at least one standardized value for at least one data value by applying the standardization table in a context of at least one data value, receiving a database query identifying the standardization function, at least one database value and the context of the data, and invoking the standardization function.
摘要:
Methods, computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.
摘要:
Computer program products and systems are provided for mining for sub-patterns within a text data set. The embodiments facilitate finding a set of N frequently occurring sub-patterns within the data set, extracting the N sub-patterns from the data set, and clustering the extracted sub-patterns into K groups, where each extracted sub-pattern is placed within the same group with other extracted sub-patterns based upon a distance value D that determines a degree of similarity between the sub-pattern and every other sub-pattern within the same group.
摘要:
According to one embodiment of the present invention, a system controls cleansing of data within a database system, and comprises a computer system including at least one processor. The system receives a data set from the database system, and one or more features of the data set are selected for determining values for one or more characteristics of the selected features. The determined values are applied to a data quality estimation model to determine data quality estimates for the data set. Problematic data within the data set are identified based on the data quality estimates, where the cleansing is adjusted to accommodate the identified problematic data. Embodiments of the present invention further include a method and computer program product for controlling cleansing of data within a database system in substantially the same manner described above.