摘要:
A standard mechanism for directly accessing unstructured data types (e.g., image, audio, video, gene sequencing and text data) in accordance with data mining operations is provided. The subject innovation can enable access to unstructured data directly from within the data mining engine or tool. Accordingly, the innovation enables multiple vendors to provide algorithms for mining unstructured data on a data mining platform (e.g., an SQL-brand server), thereby increasing adoption. As well, the subject innovation allows users to directly mine unstructured data that is not fixed-length, without pre-processing and tokenizing the data external to the data mining engine. In accordance therewith, the innovation can provide a mechanism to expand declarative language content types to include an “unstructured” data type thereby enabling a user and/or application to affirmatively designate mining data as an unstructured type.
摘要:
The subject disclosure pertains to extensible data mining systems, means, and methodologies. For example, a data mining system is disclosed that supports plug-in or integration of non-native mining algorithms, perhaps provided by third parties, such that they function the same as built-in algorithms. Furthermore, non-native data mining viewers may also be seamlessly integrated into the system for displaying the results of one or more algorithms including those provided by third parties as well as those built-in. Still further yet, support is provided for extending data mining languages to include user-defined functions (UDFs).
摘要:
A system that facilitates data mining comprises a reception component that receives command(s) in a declarative language that relate to utilizing an output of a first data mining model as an input to a second data mining model. An implementation component analyzes the received command(s) and implements the command(s) with respect to the first and second data mining models. In another aspect of the subject invention, the reception component can receive further command(s) in a declarative language with respect to causing one or more of the first and second data mining models to output a prediction, the prediction desirably generated without prediction input, the implementation component causes the one or more of the first and second data mining models to output the prediction.
摘要:
A method for performing data mining is provided. The method includes selecting at least one data source of unstructured text. Additionally, a transformation is selected to identify a list of terms in the unstructured text. A run-time path is established to connect the data source to the transformation to load the list of terms identified into a destination database.
摘要:
The subject invention relates to systems and methods to extend the capabilities of declarative data modeling languages. In one aspect, a declarative data modeling language system is provided. The system includes a data modeling language component that generates one or more data mining models to extract predictive information from local or remote databases. A language extension component facilitates modeling capability in the data modeling language by providing a data sequence model or a time series model within the data modeling language to support various data mining applications.
摘要:
Systems and methods that cleanse data in Extract, Transform, Load environments (ETL), via employing an outlier detect component that is positioned in data pipeline to data warehouse(s). Such outlier detect component employs a cluster mining model to split data into normal and outlier data. Different predictive models can be employed to detect outliers in different data slices to enhance the accuracy of the predictions. In addition, a graphical user interface (GUI) enables a user to interact with cluster groups that are created and/or analyzed by the outlier detect component.
摘要:
The subject disclosure pertains to systems and methods for data caching and/or lookup. A data-mining model can be employed to identify data item relationships, associations, and/or affinities. A cache or other fast memory can then be populated based on data mining information. A lookup component can interact with the memory to facilitate expeditious lookup or discovery of information, for example to aid data warehouse population, amongst other things.
摘要:
A language schema that integrates multidimensional extensions (e.g., MDX) and data mining extensions (e.g., DMX) for performing data mining operations on data residing in OLAP cubes. The schema provides that the can not only be a relational query, rather a multidimensional query formed using MDX, for example. The operations of model creation, training and prediction are described.
摘要:
The present web service platform includes a set of application program interfaces (APIs) and a framework for adding services that correspond to the APIs. The web service platform may also support a stored procedure (sproc) that allows combining results from two or more services before transmitting results to an application. The services relate to keyword technologies.
摘要:
A system that facilitates data mining comprises a reception component that receives command(s) in a declarative language that relate to utilizing an output of a first data mining model as an input to a second data mining model. An implementation component analyzes the received command(s) and implements the command(s) with respect to the first and second data mining models. In another aspect of the subject invention, the reception component can receive further command(s) in a declarative language with respect to causing one or more of the first and second data mining models to output a prediction, the prediction desirably generated without prediction input, the implementation component causes the one or more of the first and second data mining models to output the prediction.