METHOD AND APPARATUS FOR PRE-PROCESSING AUDIO SIGNALS
    21.
    发明申请
    METHOD AND APPARATUS FOR PRE-PROCESSING AUDIO SIGNALS 审中-公开
    用于预处理音频信号的方法和装置

    公开(公告)号:WO2014143491A1

    公开(公告)日:2014-09-18

    申请号:PCT/US2014/016349

    申请日:2014-02-14

    Abstract: The disclosure is directed to pre-processing audio signals. In one implementation, an electronic device (102) receives an audio signal that has audio information, obtains auxiliary information (such as location, velocity, direction, light, proximity of objects, and temperature), and determines, based on the audio information and the auxiliary information, a type of audio environment in which the electronic device (102) is operating. The device (102) selects an audio pre-processing procedure based on the determined audio environment type and pre-processes the audio signal according to the selected pre-processing procedure. The device (102) may then perform speech recognition on the pre-processed audio signal.

    Abstract translation: 本发明涉及预处理音频信号。 在一个实现中,电子设备(102)接收具有音频信息的音频信号,获得辅助信息(诸如位置,速度,方向,光,对象的接近度和温度),并且基于音频信息和 辅助信息,电子设备(102)在其中操作的一种类型的音频环境。 设备(102)基于所确定的音频环境类型来选择音频预处理过程,并根据所选择的预处理过程对音频信号进行预处理。 然后,设备(102)可以对预处理的音频信号执行语音识别。

    SERVER-SIDE ASR ADAPTATION TO SPEAKER, DEVICE AND NOISE CONDITION VIA NON-ASR AUDIO TRANSMISSION
    23.
    发明申请
    SERVER-SIDE ASR ADAPTATION TO SPEAKER, DEVICE AND NOISE CONDITION VIA NON-ASR AUDIO TRANSMISSION 审中-公开
    通过非ASR音频传输的服务器侧ASR适应于扬声器,设备和噪音条件

    公开(公告)号:WO2014133525A1

    公开(公告)日:2014-09-04

    申请号:PCT/US2013/028288

    申请日:2013-02-28

    Abstract: A mobile device is adapted for automatic speech recognition (ASR). A user interface for interaction with a user includes an input microphone for obtaining speech inputs from the user for automatic speech recognition, and an output interface for system output to the user based on ASR results that correspond to the speech input. A local controller obtains a sample of non-ASR audio from the input microphone for ASR- adaptation to channel-specific ASR characteristics, and then provides a representation of the non-ASR audio to a remote ASR server for server-side adaptation to the channel- specific ASR characteristics, and then provides a representation of an unknown ASR speech input from the input microphone to the remote ASR server for determining ASR results corresponding to the unknown ASR speech input, and then provides the system output to the output interface.

    Abstract translation: 移动设备适用于自动语音识别(ASR)。 用于与用户交互的用户界面包括用于从用户获得用于自动语音识别的语音输入的输入麦克风,以及用于基于对应于语音输入的ASR结果向用户输出系统的输出接口。 本地控制器从输入麦克风获取非ASR音频的样本,用于ASR适应信道特定的ASR特性,然后向远程ASR服务器提供非ASR音频的表示,用于服务器端适配信道 - 具体ASR特性,然后提供从输入麦克风到远程ASR服务器的未知ASR语音输入的表示,以确定与未知ASR语音输入相对应的ASR结果,然后将系统输出提供给输出接口。

    ADAPTIVE AUDIO SIGNAL SHAPING FOR IMPROVED PLAYBACK IN A NOISY ENVIRONMENT
    24.
    发明申请
    ADAPTIVE AUDIO SIGNAL SHAPING FOR IMPROVED PLAYBACK IN A NOISY ENVIRONMENT 审中-公开
    适应性音频信号形成改进的声音环境中的回放

    公开(公告)号:WO2014035845A2

    公开(公告)日:2014-03-06

    申请号:PCT/US2013/056544

    申请日:2013-08-25

    Applicant: QOSOUND, INC.

    Abstract: Provided is a method for adaptively enhancing an end-user's perceived quality, or quality of experience (QoE), of speech and other audio under ambient noise conditions. The method comprises the steps of determining the ambient noise characteristics on a continuous basis to capture the time varying nature of ambient noises, and adaptively determining the most optimal signal shaping to be applied to the audio/speech signal to produce the most appropriate enhancement to compensate for the ambient noise impairment. The method also comprises a signal shaping technique by using an infinite impulse response (IIR) filter that performs the signal modification with a low delay; a multi-level automatic gain control (AGC); and a controlled amplitude clipping module that assures samples are below a certain limit; and outputs the modified signal for playback through a loudspeaker or the like.

    Abstract translation: 提供了一种用于在环境噪声条件下自适应地增强终端用户的语音质量或体验质量(QoE)的方法。 该方法包括以下步骤:连续地确定环境噪声特性以捕获环境噪声的时变特性,以及自适应地确定要应用于音频/语音信号的最佳信号整形以产生最适当的增强以补偿 用于环境噪声损害。 该方法还包括通过使用以低延迟执行信号修改的无限脉冲响应(IIR)滤波器的信号整形技术; 多级自动增益控制(AGC); 以及确保样品低于一定限度的受控幅度限幅模块; 并且通过扬声器等输出用于回放的修改信号。

    SYSTEMS AND METHODS FOR AUDIO SIGNAL PROCESSING
    25.
    发明申请
    SYSTEMS AND METHODS FOR AUDIO SIGNAL PROCESSING 审中-公开
    用于音频信号处理的系统和方法

    公开(公告)号:WO2013162994A2

    公开(公告)日:2013-10-31

    申请号:PCT/US2013/037109

    申请日:2013-04-18

    Abstract: A method for restoring a processed speech signal by an electronic device is described. The method includes obtaining at least one audio signal. The method also includes performing bin-wise voice activity detection based on the at least one audio signal. The method further includes restoring the processed speech signal based on the bin-wise voice activity detection.

    Abstract translation: 描述了一种通过电子设备恢复已处理语音信号的方法。 该方法包括获得至少一个音频信号。 该方法还包括基于至少一个音频信号执行二进制语音活动检测。 该方法还包括基于二进制语音活动检测恢复处理的语音信号。

    雑音抑制方法、プログラム及び装置
    26.
    发明申请
    雑音抑制方法、プログラム及び装置 审中-公开
    噪声报警方法,程序和设备

    公开(公告)号:WO2013132959A1

    公开(公告)日:2013-09-12

    申请号:PCT/JP2013/053098

    申请日:2013-02-08

    CPC classification number: G10L21/0264 G10L15/20 G10L21/0216

    Abstract: 本発明は、音声認識におけるモデルベースの雑音抑制の新規な手法を提供することを目的とする。 本発明は、モデルベースの雑音補正において、観測値yを因子とするミスマッチベクトルg(あるいは、クリーン音声x)の確率分布とバンドごとの信頼性指標βを因子とするミスマッチベクトルg(あるいは、クリーン音声x)の確率分布の積として表現される確率モデルを生成し、当該確率モデルを対象としてMMSE推定を実行し、クリーン音声推定値x^を推定する。その結果、各バンドが、その信頼性の高さに応じた寄与度でMMSE推定の結果に影響を与えるようになり、さらに、観測音声のS/N比が高くなるにつれて、その出力値が観測値の方にシフトするので、その結果として、フロントエンドの出力が最適化される。

    Abstract translation: 本发明的目的是提供一种语音识别中的模型库的噪声减轻技术。 在模型库的噪声校正中,本发明生成概率模型,其被表示为不匹配向量(g)(或干净的语音(x))的概率分布与观测值(y)的乘积,作为 因素和每个频带的可靠性指数(β)的失配向量(g)(或干净的语音(x))的概率分布作为因子,对概率模型执行MMSE估计,并且估计干净的语音估计值 (X ^)。 结果,每个频带对MMSE估计的结果具有影响,其贡献度对应于其可靠性的大小。 此外,随着观察到的声音的SNR增加,其输出值向观测值移动,结果,前端的输出被优化。

    一种基于信道模式噪声的录音回放攻击检测方法和系统

    公开(公告)号:WO2013060079A1

    公开(公告)日:2013-05-02

    申请号:PCT/CN2011/084868

    申请日:2011-12-29

    CPC classification number: G10L15/20

    Abstract: 本发明涉及智能语音信号处理、模式识别与人工智能技术领域,特别是涉及一种基于信道模式噪声的说话人识别系统中录音回放攻击检测方法和系统。本发明公开了一种说话人识别系统中更加简便和高效的录音回放攻击检测方法,所述方法步骤如下:(1)输入待识别语音信号;(2)对语音信号进行预处理;(3)提取预处理后语音信号中的信道模式噪声;(4)提取基于信道模式噪声的长时统计特征;(5)根据信道噪声分类判决模型对长时统计特征进行分类。本发明利用信道模式噪声进行录音回放攻击检测,所提取的特征维数低,计算复杂度低,错误识别率低。因此,可极大提高说话人识别系统的安全性能,更易于在现实中使用。

    PROGRESSIVE ENCODING OF AUDIO
    28.
    发明申请
    PROGRESSIVE ENCODING OF AUDIO 审中-公开
    音频编码

    公开(公告)号:WO2012050784A2

    公开(公告)日:2012-04-19

    申请号:PCT/US2011/052807

    申请日:2011-09-22

    CPC classification number: G10L15/20 G10L15/18 G10L2015/223

    Abstract: The present disclosure includes processing a signal to generate a first sub-set of data, transmitting the first sub-set of data for generation of a reconstructed audio signal, the reconstructed audio signal having a fidelity relative to the signal, processing the signal to generate a second sub-set of data and a third sub-set of data, the second sub-set of data defining a second portion of the signal and comprising data that is different than data of the first sub-set of data, and the third sub-set of data defining a third portion of the signal and comprising data that is different than data of the first and second sub-sets of data, comparing a priority of the second sub-set of data to a priority of the third sub-set of data, and transmitting one of the second sub-set of data and the third sub-set of data over the network for improving the fidelity.

    Abstract translation: 本公开包括处理信号以产生第一数据子集,发送用于生成重构音频信号的第一数据子集,重构音频信号相对于该信号具有保真度,处理该信号以产生 第二子集数据和第三子数据集合,所述第二子数据集定义所述信号的第二部分,并且包括不同于所述第一数据子集的数据的数据,并且所述第三子集 定义信号的第三部分的数据子集包括与第一和第二数据子集的数据不同的数据,将第二子数据集的优先级与第三子集的优先级进行比较, 并且通过网络发送数据的第二子集和第三子数据集中的一个,以提高保真度。

    SPEECH AND NOISE MODELS FOR SPEECH RECOGNITION
    29.
    发明申请
    SPEECH AND NOISE MODELS FOR SPEECH RECOGNITION 审中-公开
    用于语音识别的语音和噪声模型

    公开(公告)号:WO2011159628A1

    公开(公告)日:2011-12-22

    申请号:PCT/US2011/040225

    申请日:2011-06-13

    CPC classification number: G10L15/20 G10L21/0208

    Abstract: An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.

    Abstract translation: 可以接收由基于来自用户的音频输入的设备生成的音频信号。 音频信号可以包括至少一个对应于由该设备记录的一个或多个用户话语的用户音频部分。 可以访问与用户相关联的用户语音模型,并且可以确定音频信号中的背景音频低于定义的阈值。 响应于确定音频信号中的背景音频低于定义的阈值,可以基于音频信号来调整所访问的用户语音模型,以生成对用户的语音特征进行建模的适配的用户语音模型。 可以使用适应的用户语音模型对所接收的音频信号执行噪声补偿,以生成与接收的音频信号相比具有降低的背景音频的滤波音频信号。

Patent Agency Ranking