Abstract:
A system and method for predicting the class of future customer calls to a call center. Saved call data is analyzed using a robust tokenizer of a computerized device. The tokenizer transforms a sequence of characters in a call summary field of the saved call data into a sequence of tokens. Tokenized call data is produced. Multiple maximum entropy (MaxEnt) models are created based on the tokenized call data, using the computerized device. The MaxEnt models produce a probability distribution of all classes for a next call to a call center. A conditional random field (CRF) classifier is trained with the MaxEnt models and information from the saved call data, using the computerized device. The CRF classifier uses chronologically ordered sequences of prior calls to the call center and predicts a class for a new call to the call center based on the saved call data. A call class prediction is produced for the new call received from a returning customer based on the CRF classifier and the MaxEnt model.
Abstract:
The present invention generally relates to systems and methods for executing scripts (a sequence of declarative operations) on large data sets. Some implementations store descriptions of previously-executed operations and associated input and output data sets. When executing similar operations on the same, a subset of, a superset of, or any fragment of data subsequently, some implementations detect duplication of operations and access previously-stored output data sets in order to re-use data and reduce the amount of execution, thus avoiding time-consuming duplicative computations.
Abstract:
The present invention generally relates to systems and methods for visual process analysis. The disclosed techniques can include: obtaining a theoretical and an empirical process model, generating a theoretical process layout corresponding to the theoretical process model, where the theoretical process layout is generated using a layout algorithm, generating an empirical process layout corresponding to the empirical process model, where the empirical process layout is generated using the layout algorithm, superposing the empirical process layout onto the theoretical process layout, such that a superposition layout is generated, annotating the superposition layout based on ugliness indicators, such that an annotated superposition layout is generated, and causing the annotated superposition layout to be displayed.
Abstract:
Systems and methods of data analytics, which in various embodiments enable business analysts to apply certain machine learning and analytics algorithms in a self-service manner by binding them to generic business questions that they can be used to answer in particular domains. The general approach may be to define the application of an algorithm to solve specific problems (questions) for particular combinations of a business domain and a data category. At design time, the algorithm may be linked to canonical data within a data category and programmed to run with this canonical data set. At runtime, given a dataset and its category, and a business domain, a user may choose from the corresponding questions and the system may run the algorithm bound to that question.
Abstract:
The present invention generally relates to systems and methods for visual process analysis. The disclosed techniques can include: obtaining a theoretical and an empirical process model, generating a theoretical process layout corresponding to the theoretical process model, where the theoretical process layout is generated using a layout algorithm, generating an empirical process layout corresponding to the empirical process model, where the empirical process layout is generated using the layout algorithm, superposing the empirical process layout onto the theoretical process layout, such that a superposition layout is generated, annotating the superposition layout based on ugliness indicators, such that an annotated superposition layout is generated, and causing the annotated superposition layout to be displayed.
Abstract:
A method and system for processing informational items originating from a plurality of information sources into a derived document for topical analysis thereof. Informational items are collated from a one of the sources in accordance with a predetermined plurality of relevant attributes and a key property value of common to select ones of the relevant attributes. Informational items are then grouped from the plurality of sources associated with the key common property value to form a document, wherein the informational items therein are marked on the informational source thereof. The document is then analyzed for topical identification.
Abstract:
The present invention generally relates to systems and methods for executing scripts (a sequence of declarative operations) on large data sets. Some implementations store descriptions of previously-executed operations and associated input and output data sets. When executing similar operations on the same, a subset of, a superset of, or any fragment of data subsequently, some implementations detect duplication of operations and access previously-stored output data sets in order to re-use data and reduce the amount of execution, thus avoiding time-consuming duplicative computations.
Abstract:
A system and method for characterizing textual data by generating a first data abstraction based on a set of textual data. The first data abstraction can be presented to a user, and the user can provide instructions to make changes to the first data abstraction to generate a second data abstraction. The textual data can be extracted and characterized from the set of textual data using the second data abstraction.
Abstract:
A process definition is partitioned for execution in a system architecture that enables the communication and meta-orchestration of multiple distributed engines. The partitioning method creates separate scripts for each group (execution engine, computer, distributed computer, etc.) where each script has the same representation as the original control flow, but keeps local services and replaces remote services with data flow messages and synchronization points. This method ensures that the resulting process has the same result as the original process executed with a single engine. Additional advantages include: the number of partitions of the process is minimized to equal to the number of distributed engines; the communication between engines is minimized to only data flow messages; there is no dependency on a specific process representation such as BPMN; and reduced implementation complexity.
Abstract:
Systems and methods of data analytics, which in various embodiments enable business analysts to apply certain machine learning and analytics algorithms in a self-service manner by binding them to generic business questions that they can be used to answer in particular domains. The general approach may be to define the application of an algorithm to solve specific problems (questions) for particular combinations of a business domain and a data category. At design time, the algorithm may be linked to canonical data within a data category and programmed to run with this canonical data set. At runtime, given a dataset and its category, and a business domain, a user may choose from the corresponding questions and the system may run the algorithm bound to that question.