-
公开(公告)号:WO2023036017A1
公开(公告)日:2023-03-16
申请号:PCT/CN2022/115883
申请日:2022-08-30
Applicant: 广西电网有限责任公司贺州供电局
Inventor: 莫梓樱 , 朱明增 , 覃秋勤 , 吕鸣 , 刘小兰 , 陈极万 , 韩竞 , 李和峰 , 蒋志儒 , 覃景涛 , 黄金 , 卢迎 , 韦晓明 , 李梅 , 周素君 , 梁维 , 罗晨怡 , 梁豪 , 奉华
IPC: G10L15/16 , G10L15/02 , G10L15/26 , G10L25/24 , G10L15/05 , G10L15/06 , G10L15/20 , G10L21/0208 , G06N3/08 , G06Q50/06
Abstract: 一种用于电网调度的语音识别方法及系统,其方法包括:获取电网调度中的原始语音信号;对原始语音信号进行降噪预处理;对降噪预处理的原始语音信号进行快速傅里叶变换FFT;利用梅尔频率倒谱系数MFCC对进行了快速傅里叶变换FFT的原始语音信号进行特征提取;将深度学习神经网络DNN和长短期记忆神经网络LSTM相结合得到组合神经网络DNN-LSTM算法,利用该算法对经特征提取后的原始语音信号进行声学模型训练;利用解码器基于声学模型输出结果、语音模型以及字典寻找出最佳文本输出结果。
-
公开(公告)号:WO2023277886A1
公开(公告)日:2023-01-05
申请号:PCT/US2021/039662
申请日:2021-06-29
Applicant: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
Inventor: DA FONTE LOPES DA SILVA, Andre , OZAKI, Carol Tatsuko
IPC: G10L15/20 , G10L21/0232 , G10L25/84
Abstract: In some examples, a method includes initiating an audio session. In some examples, the method includes receiving a classification for the audio session. In some examples, the method includes automatically enabling inbound noise removal when the classification is voice.
-
3.
公开(公告)号:WO2022271746A1
公开(公告)日:2022-12-29
申请号:PCT/US2022/034407
申请日:2022-06-21
Applicant: NUANCE COMMUNICATIONS, INC.
Inventor: WENINGER, Felix , GAUDESI, Marco , LEIBOLD, Ralf , ZHAN, Puming
IPC: G10L15/34 , G10L15/26 , G10L15/20 , G10L15/22 , G10L15/04 , G10L19/02 , G10L21/0208 , G10L25/24
Abstract: An end-to-end automatic speech recognition (ASR) system includes: first encoder configured for close-talk input captured by a close-talk input mechanism; second encoder configured for far-talk input captured by far-talk input mechanism; and encoder selection layer configured to select at least one of first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and far-talk input. If signals from both the close-talk input mechanism and far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce ASR output.
-
公开(公告)号:WO2022267405A1
公开(公告)日:2022-12-29
申请号:PCT/CN2021/140759
申请日:2021-12-23
Applicant: 达闼机器人股份有限公司
Inventor: 李翠姣
IPC: G10L15/20 , G10L15/02 , G10L15/06 , G10L15/16 , G10L15/18 , G10L15/063 , G10L15/1822 , G10L15/183 , G10L15/22 , G10L15/26 , G10L2015/223 , G10L2015/225
Abstract: 本申请实施例涉及语音交互技术领域,提出了一种语音交互方法、系统、电子设备及存储介质,语音交互方法包括:获取语音信号经自动语音识别ASR处理后得到的文本信息,其中,语音信号为从环境中获取的声音信号;对文本信息进行特征提取,得到文本信息的特征向量;将特征向量输入训练好的无意义文本识别模型,根据无意义文本识别模型的输出结果判断文本信息是否为无意义文本,其中,无意义文本为不符合常规表达方式的文本;若文本信息不是无意义文本,在利用训练好的应答判断模型检测到需要对文本信息进应答后,对文本信息进行应答。
-
公开(公告)号:WO2022254912A1
公开(公告)日:2022-12-08
申请号:PCT/JP2022/014683
申请日:2022-03-25
Applicant: 株式会社NTTドコモ
Abstract: 非言語音の認識結果を出力しない音声認識装置を提供することを目的とする。 音声認識装置100は、音声区間検出により得られた所定単位で音声波形信号(音情報)を取得する音声取得部101(音情報取得部)と、音情報である音声波形信号が音声である場合には、音声認識処理に基づいた結果を出力し、前記音情報が非言語である場合には、音声認識処理に基づいた結果を出力しないための処理を行う音情報処理部と、を備える。本開示において、この音情報処理部は、音声認識部102、非言語音声認識部103、スコア判定部104、および結果出力部105から構成されている。
-
6.
公开(公告)号:WO2022107393A1
公开(公告)日:2022-05-27
申请号:PCT/JP2021/027243
申请日:2021-07-20
Inventor: ZHENG Changxi , XU Ruilin , WU Rundi , VONDRICK Carl , ISHIWAKA Yuko
IPC: G10L21/0208 , G10L21/0216 , G10L21/0232 , G10L21/0308 , G10L15/04 , G10L15/16 , G10L15/20
Abstract: Disclosed are methods, systems, device, and other implementations, including a method that includes receiving an audio signal representation, detecting in the received audio signal representation, using a first learning model, one or more silent intervals with reduced foreground sound levels, determining based on the detected one or more silent intervals an estimated full noise profile corresponding to the audio signal representation, and generating with a second learning model, based on the received audio signal representation and on the determined estimated full noise profile, a resultant audio signal representation with a reduced noise level.
-
-
公开(公告)号:WO2021218843A1
公开(公告)日:2021-11-04
申请号:PCT/CN2021/089556
申请日:2021-04-25
Applicant: 阿里巴巴集团控股有限公司
IPC: G10L15/20
Abstract: 一种流式端到端语音识别方法、装置及电子设备,方法包括:以帧为单位对接收到的语音流进行语音声学特征提取并进行编码(S301);对已完成编码的帧进行分块处理,并对同一分块中包含的需要进行编码输出的激活点数量进行预测(S302);根据预测结果确定需要进行解码输出的激活点所在的位置,以便解码器在激活点所在的位置进行解码并输出识别结果(S303)。通过本方法能够提升流式端到端语音识别系统对噪声的鲁棒性,进而提升系统性能以及准确度。
-
公开(公告)号:WO2021216679A1
公开(公告)日:2021-10-28
申请号:PCT/US2021/028358
申请日:2021-04-21
Applicant: SAINT LOUIS UNIVERSITY
Inventor: BUCHOLZ, Richard D.
Abstract: Provided herein are systems and methods to verbally control a host computer for data entry into the computer, such as for electronic medical record data entry. The verbal interface system may be operable to receive, via a connection interface, at least one video capture of the host computer display; analyze the at least one video capture to determine a plurality of data entry fields displayed on the host computer display; receive, via a microphone, a verbal command correlating to at least one of the plurality of data entry fields; and autonomously control the mouse and/or the keyboard to perform mouse and/or text input into the host computer.
-
公开(公告)号:WO2021211878A1
公开(公告)日:2021-10-21
申请号:PCT/US2021/027532
申请日:2021-04-15
Applicant: SCHWARTZ, Bernard J.
Inventor: SCHWARTZ, Bernard J.
Abstract: Devices for preventing unintended conversation from being recorded by a voice activated assistant device/application (VAD) are disclosed. The device is contoured to fit over a functional surface of a VAD that typically includes a plurality of microphones and control buttons. The device covers the microphones and uses its own microphones to monitor for an authorization input signal. In an embodiment, the devices uses speakers aligned with and opposing each VAD microphone. The device emits interfering audible signals during this mode of operation. Once the device senses an authorization input, the device decouples its speakers from the interfering audible signal and instead allows the device microphones to pass through to the VAD. During this mode, the VAD is in normal operation.
-
-
-
-
-
-
-
-
-