EXPLAINING OUTLIERS IN TIME SERIES AND EVALUATING ANOMALY DETECTION METHODS

    公开(公告)号:US20220253426A1

    公开(公告)日:2022-08-11

    申请号:US17170164

    申请日:2021-02-08

    IPC分类号: G06F16/23 G06N3/04 G06N3/08

    摘要: Time series data can be received. A machine learning model can be trained using the time series data. A contaminating process can be estimated based on the time series data, the contaminating process including outliers associated with the time series data. A parameter associated with the contaminating process can be determined. Based on the trained machine learning model and the parameter associated with the contaminating process, a single-valued metric can be determined, which represents an impact of the contaminating process on the machine learning model's future prediction. A plurality of different outlier detecting machine learning models can be used to estimate the contaminating process and the single-valued metric can be determined for each of the plurality of different outlier detecting machine learning models. The plurality of different outlier detecting machine learning models can be ranked according to the associated single-valued metric.

    LENGTH PERTURBATION TECHNIQUES FOR IMPROVING GENERALIZATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS

    公开(公告)号:US20240170005A1

    公开(公告)日:2024-05-23

    申请号:US18057967

    申请日:2022-11-22

    IPC分类号: G10L21/04

    CPC分类号: G10L21/04

    摘要: One or more systems, devices, computer program products and/or computer-implemented methods of use provided herein relate to length perturbation techniques for improving generalization of DNN acoustic models. A computer-implemented system can comprise a memory that can store computer executable components. The computer-implemented system can further comprise a processor that can execute the computer executable components stored in the memory, wherein the computer executable components can comprise a frame skipping component that can remove one or more frames from an acoustic utterance via frame skipping. The computer executable components can further comprise a frame insertion component that can insert one or more replacement frames into the acoustic utterance via frame insertion to replace the one or more frames with the one or more replacement frames to enable length perturbation of the acoustic utterance.

    DYNAMIC COMPUTATION IN DECENTRALIZED DISTRIBUTED DEEP LEARNING TRAINING

    公开(公告)号:US20220012584A1

    公开(公告)日:2022-01-13

    申请号:US16925178

    申请日:2020-07-09

    IPC分类号: G06N3/08 G06N3/04

    摘要: Embodiments of a method are disclosed. The method includes performing decentralized distributed deep learning training on a batch of training data. Additionally, the method includes determining a training time wherein the learner performs the decentralized distributed deep learning training on the batch of training data. Further, the method includes generating a table having the training time and other processing times for corresponding other learners performing the decentralized distributed deep learning training on corresponding other batches of other training data. The method also includes determining that the learner is a straggler based on the table and a threshold for the training time. Additionally, the method includes modifying a processing aspect of the straggler to reduce a future training time of the straggler for performing the decentralized distributed deep learning training on a new batch of training data in response to determining the learner is the straggler.

    TRANSFORMER-BASED ENCODING INCORPORATING METADATA

    公开(公告)号:US20220358288A1

    公开(公告)日:2022-11-10

    申请号:US17308575

    申请日:2021-05-05

    摘要: From metadata of a corpus of natural language text documents, a relativity matrix is constructed, a row-column intersection in the relativity matrix corresponding to a relationship between two instances of a type of metadata. An encoder model is trained, generating a trained encoder model, to compute an embedding corresponding to a token of a natural language text document within the corpus and the relativity matrix, the encoder model comprising a first encoder layer, the first encoder layer comprising a token embedding portion, a relativity embedding portion, a token self-attention portion, a metadata self-attention portion, and a fusion portion, the training comprising adjusting a set of parameters of the encoder model.

    DYNAMIC NETWORK BANDWIDTH IN DISTRIBUTED DEEP LEARNING TRAINING

    公开(公告)号:US20220012642A1

    公开(公告)日:2022-01-13

    申请号:US16925192

    申请日:2020-07-09

    IPC分类号: G06N20/20 H04L12/24 G06N3/02

    摘要: Embodiments of a method are disclosed. The method includes performing distributed deep learning training on a batch of training data. The method also includes determining training times representing an amount of time between a beginning batch time and an end batch time. Further, the method includes modifying a communication aspect of the communication straggler to reduce a future network communication time for the communication straggler to send a future result of the distributed deep learning training on a new batch of training data in response to the centralized parameter server determining that the learner is the communication straggler.