-
91.
公开(公告)号:WO2022197296A1
公开(公告)日:2022-09-22
申请号:PCT/US2021/022823
申请日:2021-03-17
Applicant: INNOPEAK TECHNOLOGY, INC.
Inventor: LIU, Rongrong , LIN, Yuan
Abstract: This application is directed to audio purification. An electronic device obtains image data corresponding to a sequence of image frames that focus on lip movement of a person. The electronic device also obtains audio data that is synchronous with the lip movement in the sequence of image frames and modifies the audio data using the image data, thereby reducing background noise in the audio data. In some embodiments, the audio data is separated to first audio magnitude data and first audio phase data corresponding to distinct audio frequencies. The first audio magnitude data are modified to second audio magnitude data based on the image data. The first audio phase data are updated to second audio phase data based on the second audio magnitude data. The audio data is modified when the audio data are recovered from the second audio magnitude data and the second audio phase data.
-
公开(公告)号:WO2022176085A1
公开(公告)日:2022-08-25
申请号:PCT/JP2021/006024
申请日:2021-02-18
Applicant: 三菱電機株式会社
Abstract: 本開示に係る車載向け音声分離装置は、取得された車内の音声に基づいて発話レベルを算出する発話レベル算出部(40)と、取得された対話要求検知に必要な情報に基づいて座席ごとに対話要求スコアを算出する対話要求スコア算出部(20)と、前記発話レベルと前記対話要求スコアとに基づいて、前記車内の座席の中のいずれに音声入力権を付与するかを判定する音声入力権判定部(60)と、を含む。
-
公开(公告)号:WO2022083968A1
公开(公告)日:2022-04-28
申请号:PCT/EP2021/076240
申请日:2021-09-23
Applicant: THOMSON LICENSING
Inventor: NADEAU, Pascal , GILBERTON, Philippe , GAUTIER, Eric , DELAUNAY, Christophe
Abstract: The disclosure relates to a method and device for detecting an audio adversarial attack with respect to a voice command (VC) processed by an automatic speech recognition system (ASR). The method is implemented by a detection device connected to the automatic speech recognition system. The method includes: obtaining (11) an audio signal associated with the voice command; performing (12) a phonetic transcription of the audio signal, according to a phonetic transcription scheme, delivering a first character string (CS1); obtaining (13) a transcript resulting from the processing, by the automatic speech recognition system, of the audio signal; performing (14) a phonetic transcription of the transcript, according to the phonetic transcription scheme, delivering a second character string (CS2); computing (15) a similarity score (SS) between the first character string (CS1) and the second character string (CS2); and delivering (16) a piece of data representative of a detection of an audio adversarial attack, as a function of a result of a comparison between the similarity score (SS) and a predetermined threshold.
-
公开(公告)号:WO2021249284A1
公开(公告)日:2021-12-16
申请号:PCT/CN2021/098173
申请日:2021-06-03
Applicant: 中国民航大学
Inventor: 诸葛晶昌
IPC: G10L15/22 , G10L15/20 , G10L15/26 , G10L15/16 , G10L15/063
Abstract: 一种基于管制员指令语义识别的机场管制决策支持系统及方法。系统包括语音采集模块、噪声处理模块、语音识别模块、语义识别模块、冲突识别模块和显示报警终端。该系统能有效杜绝在管制过程中的因人为因素而造成的事故及其事故征候,可提高飞机地面运行安全。区别于普通语音识别和语义识别,针对的是航空管制特有的语音发音,进行语音语调的数据标注,最终构建出符合机场管制标准用语的语音库。无需场监雷达辅助,也不依赖高级场面活动引导控制系统,无需在管制席以外进行任何设备的安装或改造,仅需在管制席位上安装语音采集装置和显示报警终端,是具经济性和实用性的机场管制决策支持系统。
-
公开(公告)号:WO2021246304A1
公开(公告)日:2021-12-09
申请号:PCT/JP2021/020361
申请日:2021-05-28
Applicant: ソニーグループ株式会社
Inventor: 平野 将人
Abstract: 例えば、音声認識の精度を向上させる。 1チャンネルの入力音声信号が単独話者の発話であるか否かを検出する単独発話検出部と、入力音声信号が単独話者の発話である場合に、音声特徴量に基づくクラスタ情報を更新するクラスタ情報更新部と、目的話者の発話区間をクラスタ情報に基づいて検出する音声区間検出部と、目的話者の音声を含む混合音声信号から当該目的話者の音声信号のみを抽出する音声抽出部とを有する信号処理装置である。
-
公开(公告)号:WO2021222678A1
公开(公告)日:2021-11-04
申请号:PCT/US2021/030049
申请日:2021-04-30
Applicant: GOOGLE LLC
Inventor: TRIPATHI, Anshuman , LU, Han , SAK, Hasim
Abstract: A method (400) for training a speech recognition model (200) with a loss function (310) includes receiving an audio signal (202) including a first segment (304) corresponding to audio spoken by a first speaker (10), a second segment corresponding to audio spoken by a second speaker, and an overlapping region (306) where the first segment overlaps the second segment. The overlapping region includes a known start time and a known end time. The method also includes generating a respective masked audio embedding (254) for each of the first and second speakers. The method also includes applying a masking loss (312) after the known end time to the respective masked audio embedding for the first speaker when the first speaker was speaking prior to the known start time, or applying the masking loss prior to the known start time when the first speaker was speaking after the known end time.
-
公开(公告)号:WO2021189979A1
公开(公告)日:2021-09-30
申请号:PCT/CN2020/136364
申请日:2020-12-15
Applicant: 平安科技(深圳)有限公司
Abstract: 一种语音增强方法、装置、计算机设备及存储介质,涉及人工智能技术领域,适用于语音数据的语音增强处理。能够自动从预先构建的语音增强参数集中选择与周围环境相匹配的语音增强参数,利用该语音增强参数对待识别语音数据进行语音增强处理后,能够使语音识别准确率达到最高。方法包括:获取待处理的语音数据(101);提取语音数据对应的第一语音特征,根据第一语音特征确定语音数据所处的目标环境,并从预先构建的语音增强参数集中选取目标环境对应的目标语音增强参数,语音增强参数用于增强不同环境下的语音识别准确率(102);根据目标语音增强参数,对语音数据进行语音增强处理,得到语音增强处理后的语音数据(103)。
-
公开(公告)号:WO2021177049A1
公开(公告)日:2021-09-10
申请号:PCT/JP2021/006156
申请日:2021-02-18
Applicant: 菱洋エレクトロ株式会社
Abstract: 【課題】無線通信により送信された音声データにおける認識時間の削減を図る音声認識システム、及び音声認識装置を提供する。 【解決手段】無線通信Wを用いた音声認識システム100であって、取得手段と、送信手段と、受信手段と、認識手段と、を備える。取得手段は、音声に基づき、音声データを取得する。送信手段は、UHF帯域を利用した無線通信により、前記音声データを送信する。受信手段は、前記音声データを連続した信号として一度に受信する。認識手段は、音素認識を用い、前記音声データの内容を認識した認識結果を導出する。例えば、前記送信手段は、パケット化処理が行われていない前記音声データを送信する。
-
公开(公告)号:WO2021147018A1
公开(公告)日:2021-07-29
申请号:PCT/CN2020/073882
申请日:2020-01-22
Applicant: QUALCOMM INCORPORATED , BAO, Xiaoming , WANG, Jingbin
Inventor: BAO, Xiaoming , WANG, Jingbin
IPC: G10L15/20
Abstract: A method performed by an electronic device is described. The method includes determining an ambient noise level based on a target audio level estimate and a noise level estimate of an audio signal. The method also includes comparing the ambient noise level with a noise threshold. The method additionally includes selecting, based on comparing the ambient noise level with the noise threshold, a verification threshold for determining whether at least a portion of the audio signal corresponds to a designated user. The method further includes determining whether to enter an active mode based on the selected verification threshold.
-
公开(公告)号:WO2021146857A1
公开(公告)日:2021-07-29
申请号:PCT/CN2020/073292
申请日:2020-01-20
Applicant: 深圳市大疆创新科技有限公司
IPC: G10L15/04 , G10L25/03 , G10L15/02 , G10L15/05 , G10L15/144 , G10L15/20 , G10L15/22 , G10L2015/223 , G10L25/24
Abstract: 一种音频处理方法及装置。该方法包括:根据音频信号的音频特征信息在音频信号中截取音频片段;基于音频片段判断是否执行窗口识别操作;窗口识别操作包括如下操作:在音频片段之后的音频信号中移动采样窗口,并对采样窗口内的音频信号进行语音识别。例如由于语音活性方法具有算力消耗少的优点,而滑动窗口方法具有抗干扰能力强的优点,因此本申请采用根据应用场景自动切换语音活性方法和滑动窗口方法来进行音频处理,可以兼顾语音活性方法和滑动窗口方法的优点,从而可以节约算力并提高音频处理的准确性。
-
-
-
-
-
-
-
-
-