-
公开(公告)号:US11694697B2
公开(公告)日:2023-07-04
申请号:US16915160
申请日:2020-06-29
CPC classification number: G10L19/005 , G10L15/08 , G10L15/14 , G10L15/142 , G10L15/20 , G10L15/02 , G10L25/18 , G10L25/21 , G10L2015/025 , G10L2019/0012
Abstract: A system and method are presented for the correction of packet loss in audio in automatic speech recognition (ASR) systems. Packet loss correction, as presented herein, occurs at the recognition stage without modifying any of the acoustic models generated during training. The behavior of the ASR engine in the absence of packet loss is thus not altered. To accomplish this, the actual input signal may be rectified, the recognition scores may be normalized to account for signal errors, and a best-estimate method using information from previous frames and acoustic models may be used to replace the noisy signal.
-
公开(公告)号:US20200151248A1
公开(公告)日:2020-05-14
申请号:US16677989
申请日:2019-11-08
Inventor: Felix Immanuel Wyss , Aravind Ganapathiraju , Pavan Buduguppa
Abstract: A system and method are presented for model derivation for entity prediction. An LSTM with 100 memory cells is used in the system architecture. Sentences are truncated and provided with feature information to a named-entity recognition model. A forward and a backward pass of the LSTM are performed, and each pass is concatenated. The concatenated bi-directional LSTM encodings are obtained for the various features for each word. A fully connected set of neurons shared across all encoded words is obtained and the final encoded outputs with dimensions equal to the number of entities is determined.
-
13.
公开(公告)号:US20190244611A1
公开(公告)日:2019-08-08
申请号:US16265148
申请日:2019-02-01
Inventor: Tejas Godambe , Aravind Ganapathiraju
IPC: G10L15/22 , G10L15/16 , G10L15/14 , G10L15/02 , G10L15/197
CPC classification number: G10L15/22 , G10L15/02 , G10L15/14 , G10L15/16 , G10L15/197 , G10L2015/025
Abstract: A system and method are presented for the automatic filtering of test utterance mismatches in automatic speech recognition (ASR) systems. Test data are evaluated for match between audio and text in a language-independent manner. Utterances having mismatch are identified and isolated for either removal or manual verification to prevent incorrect measurements of the ASR system performance. In an embodiment, contiguous stretches of low probabilities in every utterance are searched for and removed. Such segments may be intra-word or cross-word. In another embodiment, scores may be determined using log DNN probability for every word in each utterance. Words may be sorted in the order of the scores and those utterances containing the least word scores are removed.
-
公开(公告)号:US20190080701A1
公开(公告)日:2019-03-14
申请号:US16186851
申请日:2018-11-12
Abstract: A system and method are presented for the correction of packet loss in audio in automatic speech recognition (ASR) systems. Packet loss correction, as presented herein, occurs at the recognition stage without modifying any of the acoustic models generated during training. The behavior of the ASR engine in the absence of packet loss is thus not altered. To accomplish this, the actual input signal may be rectified, the recognition scores may be normalized to account for signal errors, and a best-estimate method using information from previous frames and acoustic models may be used to replace the noisy signal.
-
15.
公开(公告)号:US11195514B2
公开(公告)日:2021-12-07
申请号:US16414885
申请日:2019-05-17
Inventor: Ramasubramanian Sundaram , Aravind Ganapathiraju , Yingyi Tan
Abstract: A system and method are presented for a multiclass approach for confidence modeling in automatic speech recognition systems. A confidence model may be trained offline using supervised learning. A decoding module is utilized within the system that generates features for audio files in audio data. The features are used to generate a hypothesized segment of speech which is compared to a known segment of speech using edit distances. Comparisons are labeled from one of a plurality of output classes. The labels correspond to the degree to which speech is converted to text correctly or not. The trained confidence models can be applied in a variety of systems, including interactive voice response systems, keyword spotters, and open-ended dialog systems.
-
公开(公告)号:US11134155B1
公开(公告)日:2021-09-28
申请号:US17139033
申请日:2020-12-31
Inventor: Felix Immanuel Wyss , Ramasubramanian Sundaram , Aravind Ganapathiraju
Abstract: A method for automated generation of contact center system embeddings according to one embodiment includes determining, by a computing system, contact center system agents, contact center system agent skills, and/or contact center system virtual queue experiences; generating, by the computing system, a matrix representation based on the contact center system agents, the contact center system agent skills, and/or the contact center system virtual queue experiences; generating, by the computing system and based on the matrix representation, contact center system agent identifiers, contact center system agent skills identifiers, and/or contact center system virtual queue identifiers; transforming, by the computing system, the contact center system agent identifiers, the contact center system agent skills identifiers, and/or the contact center system virtual queue identifiers into the contact center system agent embeddings, contact center system agent skills embeddings, and/or contact center system virtual queue embeddings, wherein weights of the contact center system agent embeddings, the contact center system agent skills embeddings, and/or the contact center system virtual queue embeddings are randomly initialized; and training, by the computing system, the contact center system agent embeddings, the contact center system agent skills embeddings, and/or the contact center system virtual queue embeddings by applying machine learning to obtain final weights of the contact center system agent embeddings, the contact center system agent skills embeddings, and/or the contact center system virtual queue embeddings.
-
公开(公告)号:US10360898B2
公开(公告)日:2019-07-23
申请号:US16000742
申请日:2018-06-05
Inventor: Aravind Ganapathiraju , Yingyi Tan , Felix Immanuel Wyss , Scott Allen Randal
Abstract: A system and method are presented for predicting speech recognition performance using accuracy scores in speech recognition systems within the speech analytics field. A keyword set is selected. Figure of Merit (FOM) is computed for the keyword set. Relevant features that describe the word individually and in relation to other words in the language are computed. A mapping from these features to FOM is learned. This mapping can be generalized via a suitable machine learning algorithm and be used to predict FOM for a new keyword. In at least embodiment, the predicted FOM may be used to adjust internals of speech recognition engine to achieve a consistent behavior for all inputs for various settings of confidence values.
-
18.
公开(公告)号:US20190172442A1
公开(公告)日:2019-06-06
申请号:US16272130
申请日:2019-02-11
Inventor: Rajesh Dachiraju , E. Veera Raghavendra , Aravind Ganapathiraju
IPC: G10L13/027 , G10L13/02 , G10L13/06
Abstract: A system and method are presented for forming the excitation signal for a glottal pulse model based parametric speech synthesis system. The excitation signal may be formed by using a plurality of sub-band templates instead of a single one. The plurality of sub-band templates may be combined to form the excitation signal wherein the proportion in which the templates are added is dynamically based on determined energy coefficients. These coefficients vary from frame to frame and are learned, along with the spectral parameters, during feature training. The coefficients are appended to the feature vector, which comprises spectral parameters and is modeled using HMMs, and the excitation signal is determined.
-
-
-
-
-
-
-