-
公开(公告)号:US12230275B2
公开(公告)日:2025-02-18
申请号:US17611436
申请日:2021-01-06
Applicant: BOE Technology Group Co., Ltd.
Inventor: Shaoxun Su
Abstract: A speech instruction recognition method, an electronic device, and a non-transient computer readable storage medium. The speech instruction recognition method comprises: acquiring a target speech; processing the target speech to obtain a target speech vector corresponding to the target speech; performing speech recognition on the target speech to obtain a target speech text of the target speech, and processing the target speech text to obtain a target text vector corresponding to the target speech text; and inputting the target speech vector and the target text vector to a pre-trained instruction recognition model to obtain an instruction category corresponding to the target speech.
-
公开(公告)号:US20240331720A1
公开(公告)日:2024-10-03
申请号:US18191763
申请日:2023-03-28
Applicant: Adobe Inc. , The Trustees of Princeton University
Inventor: Zeyu JIN , Jiaqi SU , Adam FINKELSTEIN
IPC: G10L21/034 , G06N5/022 , G10L21/0232 , G10L25/18 , G10L25/24 , G10L25/60
CPC classification number: G10L21/034 , G06N5/022 , G10L21/0232 , G10L25/18 , G10L25/24 , G10L25/60 , G10L21/0364 , G10L25/30
Abstract: Embodiments are disclosed for converting audio data to studio quality audio data. The method includes obtaining an audio data having a first quality for conversion to studio quality audio. A first machine learning model predicts a set of acoustic features. A spectral mask is applied to the audio data during the prediction of the set of acoustic features. A second machine learning model generates studio quality audio from the set of acoustic features and the audio data.
-
公开(公告)号:US12067130B2
公开(公告)日:2024-08-20
申请号:US17525302
申请日:2021-11-12
Applicant: The Toronto-Dominion Bank
Inventor: Alexey Shpurov , Milos Dunjic , Brian Andrew Lam
CPC classification number: G06F21/602 , G06N20/00 , G10L15/22 , G10L25/24 , H04L9/006 , H04L9/008 , G10L2015/223
Abstract: The disclosed exemplary embodiments include computer-implemented systems, devices, apparatuses, and processes that maintain data confidentiality in communications involving voice-enabled devices in a distributed computing environment using homomorphic encryption. By way of example, an apparatus may receive encrypted command data from a computing system, decrypt the encrypted command data using a homomorphic private key, and perform operations that associate the decrypted command data with a request for an element of data. Using a public cryptographic key associated with a device, the apparatus generate an encrypted response that includes the requested data element, and transmit the encrypted response to the device. The device may decrypt the encrypted response using a private cryptographic key and to perform operations that present first audio content representative of the requested data element through an acoustic interface.
-
公开(公告)号:US11996115B2
公开(公告)日:2024-05-28
申请号:US17435761
申请日:2019-12-18
Applicant: NEC Corporation
Inventor: Mitsuru Sendoda
IPC: G10L25/24 , G10L21/0208 , G10L25/18 , G10L25/51
Abstract: A sound processing apparatus includes a feature value extractor configured to perform a Fourier transform and then a cepstral analysis of a sound signal and to extract, as feature values of the sound signal, values including frequency components obtained by the Fourier transform of the sound signal and a value based on a result obtained by the cepstral analysis of the sound signal.
-
公开(公告)号:US11996112B2
公开(公告)日:2024-05-28
申请号:US17084672
申请日:2020-10-30
Applicant: UBTECH ROBOTICS CORP LTD
Inventor: Ruotong Wang , Zhichao Tang , Dongyan Huang , Jiebin Xie , Zhiyuan Zhao , Yang Liu , Youjun Xiong
CPC classification number: G10L21/013 , G06N20/00 , G10L19/02 , G10L25/03 , G10L25/27 , G10L2021/0135
Abstract: The present disclosure discloses a voice conversion method. The method includes: obtaining a to-be-converted voice, and extracting acoustic features of the to-be-converted voice; obtaining a source vector corresponding to the to-be-converted voice from a source vector pool, and selecting a target vector corresponding to the target voice from the target vector pool; obtaining acoustic features of the target voice output by the voice conversion model by using the acoustic features of the to-be-converted voice, the source vector corresponding to the to-be-converted voice, and the target vector corresponding to the target voice as an input of the voice conversion model; and obtaining the target voice by converting the acoustic features of the target voice using a vocoder. In addition, a voice conversion apparatus and a storage medium are also provided.
-
6.
公开(公告)号:US20240169975A1
公开(公告)日:2024-05-23
申请号:US18425381
申请日:2024-01-29
Inventor: Yan Nan WANG , Jun Huang
CPC classification number: G10L15/02 , G10L15/063 , G10L15/16 , G10L25/24
Abstract: A speech processing method, performed by an electronic device, includes determining a first speech feature and a first text bottleneck feature based on to-be-processed speech information, determining a first combined feature vector based on the first speech feature and the first text bottleneck feature, inputting the first combined feature vector to a trained unidirectional long short-term memory (LSTM) model, performing speech processing on the first combined feature vector to obtain speech information after noise reduction, and transmitting the obtained speech information after noise reduction to another electronic device for playing.
-
公开(公告)号:US11948690B2
公开(公告)日:2024-04-02
申请号:US16716206
申请日:2019-12-16
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Ebrahim Nematihosseinabadi , Md M. Rahman , Viswam Nathan , Korosh Vatanparvar , Jilong Kuang , Jun Gao
Abstract: Pulmonary function estimation can include detecting one or more cough events from a time series of audio signals generated by an electronic device of a user. Based on the one or more cough events, one or more lung function metrics of the user can be determined.
-
公开(公告)号:US11875775B2
公开(公告)日:2024-01-16
申请号:US17430793
申请日:2021-04-20
Inventor: Huapeng Sima , Zhiqiang Mao , Xuefei Gong
CPC classification number: G10L15/063 , G10L15/16 , G10L25/24
Abstract: The present disclosure proposes a speech conversion scheme for non-parallel corpus training, to get rid of dependence on parallel text and resolve a technical problem that it is difficult to achieve speech conversion under conditions that resources and equipment are limited. A voice conversion system and a training method therefor are included. Compared with the prior art, according to the embodiments of the present disclosure: a trained speaker-independent automatic speech recognition model can be used for any source speaker, that is, the speaker is independent; and bottleneck features of audio are more abstract as compared with phonetic posteriorGram features, can reflect decoupling of spoken content and timbre of the speaker, and meanwhile are not closely bound with a phoneme class, and are not in a clear one-to-one correspondence relationship. In this way, a problem of inaccurate pronunciation caused by a recognition error in ASR is relieved to some extent. Pronunciation accuracy of audio obtained by performing voice conversion by the bottleneck feature is obviously higher than that of a phonetic posteriorGram based method, and timbre is not significantly different. By means of a transfer learning mode, dependence on training corpus can be greatly reduced.
-
公开(公告)号:US11848006B2
公开(公告)日:2023-12-19
申请号:US17000892
申请日:2020-08-24
Applicant: STMicroelectronics S.r.l.
Inventor: Nunziata Ivana Guarneri , Filippo Naccari
CPC classification number: G10L15/083 , G10L15/04 , G10L15/16 , G10L15/22 , G10L25/24 , G10L2015/088
Abstract: A method of processing an electrical signal transduced from a voice signal is disclosed. A classification model is applied to the electrical signal to produce a classification indicator. The classification model has been trained using an augmented training dataset. The electrical signal is classified as either one of a first class and a second class in a binary classification. The classifying being performed is a function of the classification indicator. A trigger signal is provided to a user circuit as a result of the electrical signal being classified in the first class of the binary classification.
-
公开(公告)号:US11676579B2
公开(公告)日:2023-06-13
申请号:US17073149
申请日:2020-10-16
Applicant: Deepgram, Inc.
Inventor: Jeff Ward , Adam Sypniewski , Scott Stephenson
IPC: G10L15/16 , G10L15/06 , G06N3/084 , G10L25/18 , G10L25/24 , G06V10/44 , G06F18/214 , G06F18/2413 , G06N3/044 , G06N3/045 , G06N3/048 , G06N3/08 , G10L15/02 , G10L15/22 , G10L15/30 , G10L15/197 , G10L15/08
CPC classification number: G10L15/16 , G06F18/214 , G06F18/24133 , G06N3/044 , G06N3/045 , G06N3/048 , G06N3/08 , G06N3/084 , G06V10/454 , G10L15/02 , G10L15/063 , G10L15/22 , G10L15/30 , G10L25/18 , G10L25/24 , G10L15/197 , G10L2015/0635 , G10L2015/081
Abstract: Systems and methods are disclosed for generating internal state representations of a neural network during processing and using the internal state representations for classification or search. In some embodiments, the internal state representations are generated from the output activation functions of a subset of nodes of the neural network. The internal state representations may be used for classification by training a classification model using internal state representations and corresponding classifications. The internal state representations may be used for search, by producing a search feature from an search input and comparing the search feature with one or more feature representations to find the feature representation with the highest degree of similarity.
-
-
-
-
-
-
-
-
-