TECHNIQUES FOR CORRECTING LINGUISTIC TRAINING BIAS IN TRAINING DATA

    公开(公告)号:US20190087728A1

    公开(公告)日:2019-03-21

    申请号:US16134360

    申请日:2018-09-18

    Abstract: In automated assistant systems, a deep-learning model in form of a long short-term memory (LSTM) classifier is used for mapping questions to classes, with each class having a manually curated answer. A team of experts manually create the training data used to train this classifier. Relying on human curation often results in such linguistic training biases creeping into training data, since every individual has a specific style of writing natural language and uses some words in specific context only. Deep models end up learning these biases, instead of the core concept words of the target classes. In order to correct these biases, meaningful sentences are automatically generated using a generative model, and then used for training a classification model. For example, a variational autoencoder (VAE) is used as the generative model for generating novel sentences and a language model (LM) is utilized for selecting sentences based on likelihood.

    SYSTEM AND METHOD FOR PRESCRIPTIVE ANALYTICS
    13.
    发明申请
    SYSTEM AND METHOD FOR PRESCRIPTIVE ANALYTICS 审中-公开
    用于分类分析的系统和方法

    公开(公告)号:US20160004987A1

    公开(公告)日:2016-01-07

    申请号:US14791973

    申请日:2015-07-06

    Abstract: The present subject matter discloses system and method for executing prescriptive analytics. Simulation is performed from an input data (xinput) and simulation parameters (μ) to generate simulating data (D). Further, forecast data may be predicted by processing the simulating data (D) using predictive model (M). Further, prescriptive value (x′) may be determined based on the forecast data by using optimization model. The prescriptive value (x′) may be determined such that an objective function associated with the optimization model is optimized, whereby the optimization of the objective function indicates business objective being achieved. Further, the steps of simulating, predicting and determining may be iteratively performed until the objective function is not further optimized, satisfying predefined condition. Further, at each iteration, except the first iteration, the input data (xinput) is the prescriptive value (x′) determined at immediate previous iteration, whereby at the first iteration, the input data (xinput) is a reference data.

    Abstract translation: 本主题公开了用于执行规范性分析的系统和方法。 从输入数据(xinput)和仿真参数(μ)进行仿真以产生模拟数据(D)。 此外,可以使用预测模型(M)处理模拟数据(D)来预测预测数据。 此外,可以基于使用优化模型的预测数据来确定规定值(x')。 可以确定规定值(x'),使得与优化模型相关联的目标函数被优化,由此目标函数的优化表明实现业务目标。 此外,可以迭代地执行模拟,预测和确定的步骤,直到目标函数未被进一步优化,满足预定条件。 此外,在每次迭代中,除了第一次迭代之外,输入数据(xinput)是在紧接的先前迭代时确定的规定值(x'),由此在第一次迭代时,输入数据(xinput)是参考数据。

    SYSTEMS AND METHODS FOR CLASSIFICATION OF MULTI-DIMENSIONAL TIME SERIES OF PARAMETERS

    公开(公告)号:US20200012938A1

    公开(公告)日:2020-01-09

    申请号:US16363038

    申请日:2019-03-25

    Abstract: Traditional systems and methods have implemented hand-crafted feature extraction from varying length time series that results in complexity and requires domain knowledge. Building classification models requires large labeled data and is computationally expensive. Embodiments of the present disclosure implement learning models for classification tasks in multi-dimensional time series by performing feature extraction from entity's parameters via unsupervised encoder and build a non-temporal linear classifier model. A fixed-dimensional feature vector is outputted using a pre-trained unsupervised encoder, which acts as off-the shelf feature extractor. Extracted features are concatenated to learn a non-temporal linear classification model and weight is assigned to each extracted feature during learning which helps to determine relevant parameters for each class. Mapping from parameters to target class is considered while constraining the linear model to use only subset of large number of features.

    FAILED AND CENSORED INSTANCES BASED REMAINING USEFUL LIFE (RUL) ESTIMATION OF ENTITIES

    公开(公告)号:US20200012921A1

    公开(公告)日:2020-01-09

    申请号:US16352587

    申请日:2019-03-13

    Abstract: Estimating Remaining Useful Life (RUL) from multi-sensor time series data is difficult through manual inspection. Current machine learning and data analytics methods, for RUL estimation require large number of failed instances for training, which are rarely available in practice, and these methods cannot use information from currently operational censored instances since their failure time is unknown. Embodiments of the present disclosure provide systems and methods for estimating RUL using time series data by implementing an LSTM-RNN based ordinal regression technique, wherein during training RUL value of failed instance(s) is encoded into a vector which is given as a target to the model. Unlike a failed instance, the exact RUL for a censored instance is unknown. For using the censored instances, target vectors are generated and the objective function is modified for training wherein the trained LSTM-RNN based ordinal regression is applied on an input test time series for RUL estimation.

    BILSTM-SIAMESE NETWORK BASED CLASSIFIER FOR IDENTIFYING TARGET CLASS OF QUERIES AND PROVIDING RESPONSES THEREOF

    公开(公告)号:US20190080225A1

    公开(公告)日:2019-03-14

    申请号:US15912382

    申请日:2018-03-05

    Abstract: Organizations are constantly flooded with questions, ranging from mundane to the unanswerable. It is therefore respective department that actively looks for automated assistance, especially to alleviate the burden of routine, but time-consuming tasks. The embodiments of the present disclosure provide BiLSTM-Siamese Network based Classifier for identifying target class of queries and providing responses to queries pertaining to the identified target class, which acts as an automated assistant that alleviates burden of answering queries in well-defined domains. Siamese Model (SM) is trained for a epochs, and then the same Base-Network is used to train Classification Model (CM) for b epochs iteratively until best accuracy is observed on validation test, wherein SM ensures it learns which sentences are similar/dissimilar semantically while CM learns to predict target class of every user query. Here a and b are assumed to be hyper parameters and are tuned for best performance on the validation set.

    SYSTEMS AND METHODS FOR PREDICTIVE RELIABILITY MINING

    公开(公告)号:US20170109222A1

    公开(公告)日:2017-04-20

    申请号:US15057882

    申请日:2016-03-01

    Abstract: Systems and methods for predictive reliability mining are provided that enable predicting of unexpected emerging failures in future without waiting for actual failures to start occurring in significant numbers. Sets of discriminative Diagnostic Trouble Codes (DTCs) from connected machines in a population are identified before failure of the associated parts. A temporal conditional dependence model based on the temporal dependence between the failure of the parts from past failure data and the identified sets of discriminative DTCs is generated. Future failures are predicted based on the generated temporal conditional dependence and root cause analysis of the predicted future failures is performed for predictive reliability mining. The probability of failure is computed based on both occurrence and non-occurrence of DTCs. The root cause analysis enables identifying a subset of the population when an early warning is generated and also when an early warning is not generated.

    METHOD AND SYSTEM FOR FUSING BUSINESS DATA FOR DISTRIBUTIONAL QUERIES
    19.
    发明申请
    METHOD AND SYSTEM FOR FUSING BUSINESS DATA FOR DISTRIBUTIONAL QUERIES 审中-公开
    用于分配业务数据的方法和系统

    公开(公告)号:US20170004411A1

    公开(公告)日:2017-01-05

    申请号:US15192215

    申请日:2016-06-24

    CPC classification number: G06N7/005 G06F16/2462 G06F16/2471 G06N20/00

    Abstract: The present disclosure relates to business data processing and facilitates fusing business data spanning disparate sources for processing distributional queries for enterprise business intelligence application. Particularly, the method comprises defining a Bayesian network based on one or more attributes associated with raw data spanning a plurality of disparate sources; pre-processing the raw data based on the Bayesian network to compute conditional probabilities therein as parameters; joining the one or more attributes in the raw data using the conditional probabilities; and executing probabilistic inference from a database of the parameters by employing an SQL engine. The Bayesian Network may be validated based on estimation error computed by comparing results of processing a set of validation queries on the raw data and the Bayesian Network.

    Abstract translation: 本公开涉及业务数据处理并且促进融合跨越用于处理企业商业智能应用的分配查询的不同来源的业务数据。 特别地,该方法包括基于与跨越多个不同源的原始数据相关联的一个或多个属性定义贝叶斯网络; 基于贝叶斯网络预处理原始数据,以计算其中的条件概率作为参数; 使用条件概率加入原始数据中的一个或多个属性; 并通过使用SQL引擎从参数的数据库执行概率推断。 可以基于通过比较处理原始数据和贝叶斯网络上的一组验证查询的结果计算的估计误差来验证贝叶斯网络。

    PROMPT AUGMENTED GENERATIVE REPLAY VIA SUPERVISED CONTRASTIVE TRAINING FOR LIFELONG INTENT DETECTION

    公开(公告)号:US20240013094A1

    公开(公告)日:2024-01-11

    申请号:US18215972

    申请日:2023-06-29

    CPC classification number: G06N20/00 G06F40/35 G06F40/284 G06F40/40

    Abstract: Embodiments disclosed herein model lifelong intent detection as a class-incremental learning where a new set of intents/classes are added at each incremental step. To address the issue of catastrophic forgetting during lifelong intent detection (LID), an incremental learner is provided with Prompt Augmented Generative Replay, wherein unlike existing approaches that store real samples in replay memory, only concept words obtained from old intents are stored, which reduces memory consumption and speeds up incremental training still enabling not forgetting the old intents. Joint training of an incremental learner is carried out for LID and a pseudo-labeled utterance generation with objective is to classify a user utterance into one of multiple pre-defined intents by minimizing a total Loss function comprising a LID loss function, a Labeled Utterance Generation loss function, a Supervised Contrastive Training loss function, and a Knowledge Distillation loss function.

Patent Agency Ranking