Abstract:
The present disclosure relates to performing similarity metric analysis and data enrichment using knowledge sources. A data enrichment service can compare an input data set to reference data sets stored in a knowledge source to identify similarly related data. A similarity metric can be calculated corresponding to the semantic similarity of two or more datasets. The similarity metric can be used to identify datasets based on their metadata attributes and data values enabling easier indexing and high performance retrieval of data values. A input data set can labeled with a category based on the data set having the best match with the input data set. The similarity of an input data set with a data set provided by a knowledge source can be used to query a knowledge source to obtain additional information about the data set. The additional information can be used to provide recommendations to the user.
Abstract:
Techniques are disclosure for a data enrichment system that enables declarative external data source importation and exportation. A user can specify via a user interface input for identifying different data sources from which to obtain input data. The data enrichment system is configured to import and export various types of sources storing resources such as URL-based resources and HDFS-based resources for high-speed bi-directional metadata and data interchange. Connection metadata (e.g., credentials, access paths, etc.) can be managed by the data enrichment system in a declarative format for managing and visualizing the connection metadata.
Abstract:
A search system associates contextual metadata with search terms and/or stored terms to facilitate identification of relevant information. In one implementation, a search term is identified (4304) from a received search request. The search term is then rewritten (4306) in standard form and the standard form term is then set (4308) as the current search parameter. A source database is then searched (4310) using the current search parameter. If any results are obtained (4312) these results may be output (4320) to the user. If no results are obtained, a parent classification of the search term is set (4316) as the current search parameter and the process is repeated. The invention thereby provides the ease of use of term searching with the comprehensiveness of category searching.
Abstract:
A search system associates contextual metadata with search terms and/or stored terms to facilitate identification of relevant information. In one implementation, a search term is identified (4304) from a received search request. The search term is then rewritten (4306) in standard form and the standard form term is then set (4308) as the current search parameter. A source database is then searched (4310) using the current search parameter. If any results are obtained (4312) these results may be output (4320) to the user. If no results are obtained, a parent classification of the search term is set (4316) as the current search parameter and the process is repeated. The invention thereby provides the ease of use of term searching with the comprehensiveness of category searching.
Abstract:
The present disclosure relates to performing similarity metric analysis and data enrichment using knowledge sources. A data enrichment service can compare an input data set to reference data sets stored in a knowledge source to identify similarly related data. A similarity metric can be calculated corresponding to the semantic similarity of two or more datasets. The similarity metric can be used to identify datasets based on their metadata attributes and data values enabling easier indexing and high performance retrieval of data values. A input data set can labeled with a category based on the data set having the best match with the input data set. The similarity of an input data set with a data set provided by a knowledge source can be used to query a knowledge source to obtain additional information about the data set. The additional information can be used to provide recommendations to the user.
Abstract:
Techniques are disclosure for a data enrichment system that enables declarative external data source importation and exportation. A user can specify via a user interface input for identifying different data sources from which to obtain input data. The data enrichment system is configured to import and export various types of sources storing resources such as URL-based resources and HDFS-based resources for high-speed bi-directional metadata and data interchange. Connection metadata (e.g., credentials, access paths, etc.) can be managed by the data enrichment system in a declarative format for managing and visualizing the connection metadata.
Abstract:
The present disclosure relates to performing similarity metric analysis and data enrichment using knowledge sources. A data enrichment service can compare an input data set to reference data sets stored in a knowledge source to identify similarly related data. A similarity metric can be calculated corresponding to the semantic similarity of two or more datasets. The similarity metric can be used to identify datasets based on their metadata attributes and data values enabling easier indexing and high performance retrieval of data values. A input data set can labeled with a category based on the data set having the best match with the input data set. The similarity of an input data set with a data set provided by a knowledge source can be used to query a knowledge source to obtain additional information about the data set. The additional information can be used to provide recommendations to the user.
Abstract:
The present disclosure relates to performing similarity metric analysis and data enrichment using knowledge sources. A data enrichment service can compare an input data set to reference data sets stored in a knowledge source to identify similarly related data. A similarity metric can be calculated corresponding to the semantic similarity of two or more datasets. The similarity metric can be used to identify datasets based on their metadata attributes and data values enabling easier indexing and high performance retrieval of data values. A input data set can labeled with a category based on the data set having the best match with the input data set. The similarity of an input data set with a data set provided by a knowledge source can be used to query a knowledge source to obtain additional information about the data set. The additional information can be used to provide recommendations to the user.
Abstract:
A utility is provided for generating applications for a variety of data conversion or handling application environments. A user can use a graphical user interface to purpose application adaptable modules to define a desired application. In one implementation, the user interface includes a node tree panel and a process assembly panel. The node tree panel lists tool sets including transformations, maps and input-output tools. These tools can then be assembled together with identified data sources and then elements using the assembly panel to define an application. In this manner, an application is generated from a number of generic modules simply by linking the modules to perform a purpose of the desired application. In this manner, an application is generated from a number of generic modules simply by linking the modules to perform a purpose of the desired application.
Abstract:
A search system associates contextual metadata with search terms and/or stored terms to facilitate identification of relevant information. In one implementation, a search term is identified (4304) from a received search request. The search term is then rewritten (4306) in standard form and the standard form term is then set (4308) as the current search parameter. A source database is then searched (4310) using the current search parameter. If any results are obtained (4312) these results may be output (4320) to the user. If no results are obtained, a parent classification of the search term is set (4316) as the current search parameter and the process is repeated. The invention thereby provides the ease of use of term searching with the comprehensiveness of category searching.