-
公开(公告)号:US11908454B2
公开(公告)日:2024-02-20
申请号:US17539752
申请日:2021-12-01
CPC分类号: G10L15/063 , G06N3/08 , G10L21/10
摘要: A processor-implemented method trains an automatic speech recognition system using speech data and text data. A computer device receives speech data, and generates a spectrogram based on the speech data. The computing device receives text data associated with an entire corpus of text data, and generates a textogram based upon the text data. The computing device trains an automatic speech recognition system using the spectrogram and the textogram.
-
公开(公告)号:US20230081306A1
公开(公告)日:2023-03-16
申请号:US17458772
申请日:2021-08-27
摘要: Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken order of the associated speech using an alignment technique. A spoken language understanding machine learning model can be trained using the pairs of speech and meaning representation having the reordered semantic entities. The meaning representation, e.g., semantic entities, in the received training data can be perturbed to create random order sequence variations of the semantic entities associated with speech. Perturbed meaning representation with associated speech can augment the training data.
-
公开(公告)号:US11568858B2
公开(公告)日:2023-01-31
申请号:US17073337
申请日:2020-10-17
摘要: A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting includes training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model. Transliteration of transcribed training data is performed by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data. A filtering metric is applied to the pool of transliterated data output to select one or more portions of the transliterated data for retraining of the acoustic model. Data augmentation is performed by adding one or more selected portions of the output transliterated data back to the original transcribed training data to update training data. The training of a new multilingual acoustic model through the multilingual network is performed using the updated training data.
-
公开(公告)号:US20220110542A1
公开(公告)日:2022-04-14
申请号:US17065936
申请日:2020-10-08
摘要: Determining lung capacity of includes capturing an audio waveform of the user performing an utterance presented to a user. A video of the user performing the utterance can be captured. The captured audio waveform and the video are analyzed for compliance. Based on the audio waveform, an indicator of respiratory function is determined. The indicator is compared with a reference indicator to determine health of the user. A machine learning model such as neural network can be trained to predict the indicator of the respiratory function based on input features comprising audio spectral and temporal characteristics of utterances. Determining the indicator or respiratory function can include running the trained machine learning model.
-
公开(公告)号:US20220084508A1
公开(公告)日:2022-03-17
申请号:US17021956
申请日:2020-09-15
发明人: Hong-Kwang Jeff Kuo , Zoltan Tueske , Samuel Thomas , Yinghui Huang , Brian E. D. Kingsbury , Kartik Audhkhasi
摘要: A method and system of training a spoken language understanding (SLU) model includes receiving natural language training data comprising (i) one or more speech recording, and (ii) a set of semantic entities and/or intents for each corresponding speech recording. For each speech recording, one or more entity labels and corresponding values, and one or more intent labels are extracted from the corresponding semantic entities and/or overall intent. A spoken language understanding (SLU) model is trained based upon the one or more entity labels and corresponding values, and one or more intent labels of the corresponding speech recordings without a need for a transcript of the corresponding speech recording.
-
公开(公告)号:US20200034702A1
公开(公告)日:2020-01-30
申请号:US16047287
申请日:2018-07-27
发明人: Takashi Fukuda , Masayuki Suzuki , Osamu Ichikawa , Gakuto Kurata , Samuel Thomas , Bhuvana Ramabhadran
摘要: A student neural network may be trained by a computer-implemented method, including: selecting a teacher neural network among a plurality of teacher neural networks, inputting an input data to the selected teacher neural network to obtain a soft label output generated by the selected teacher neural network, and training a student neural network with at least the input data and the soft label output from the selected teacher neural network.
-
公开(公告)号:US20190237090A1
公开(公告)日:2019-08-01
申请号:US16379667
申请日:2019-04-09
IPC分类号: G10L21/0208
CPC分类号: G10L21/0208
摘要: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, denoising a second noisy signal, utilizing the time varying projection, and expanding the clean dictionary and the noisy dictionary by updating the clean dictionary and the noisy dictionary to include new clean spectro-temporal building blocks and new noisy spectro-temporal building blocks created utilizing additional clean and noisy signals.
-
公开(公告)号:US20190205748A1
公开(公告)日:2019-07-04
申请号:US15860097
申请日:2018-01-02
IPC分类号: G06N3/08
CPC分类号: G06N3/08
摘要: A technique for generating soft labels for training is disclosed. In the method, a teacher model having a teacher side class set is prepared. A collection of class pairs for respective data units is obtained. Each class pair includes classes labelled to a corresponding data unit from among the teacher side class set and from among a student side class set that is different from the teacher side class set. A training input is fed into the teacher model to obtain a set of outputs for the teacher side class set. A set of soft labels for the student side class set is calculated from the set of the outputs by using, for each member of the student side class set, at least an output obtained for a class within a subset of the teacher side class set having relevance to the member of the student side class set, based at least in part on observations in the collection of the class pairs.
-
公开(公告)号:US10230922B2
公开(公告)日:2019-03-12
申请号:US15722704
申请日:2017-10-02
发明人: Stanley Chen , Kenneth W. Church , Vaibhava Goel , Lidia L. Mangu , Etienne Marcheret , Bhuvana Ramabhadran , Laurence P. Sansone , Abhinav Sethy , Samuel Thomas
摘要: A method of combining data streams from fixed audio-visual sensors with data streams from personal mobile devices including, forming a communication link with at least one of one or more personal mobile devices; receiving at least one of an audio data stream and/or a video data stream from the at least one of the one or more personal mobile devices; determining the quality of the at least one of the audio data stream and/or the video data stream, wherein the audio data stream and/or the video data stream having a quality above a threshold quality is retained; and combining the retained audio data stream and/or the video data stream with the data streams from the fixed audio-visual sensors.
-
公开(公告)号:US20180047409A1
公开(公告)日:2018-02-15
申请号:US15793884
申请日:2017-10-25
IPC分类号: G10L21/0224 , G10L25/24 , G10L21/0388
CPC分类号: G10L21/0208
摘要: A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising a second noisy signal, utilizing the time varying projection.
-
-
-
-
-
-
-
-
-