Speaker dependent voiced sound pattern detection thresholds

    公开(公告)号:US10242677B2

    公开(公告)日:2019-03-26

    申请号:US14835192

    申请日:2015-08-25

    Inventor: Alexander Escott

    Abstract: Various implementations disclosed herein include a training module configured to determining a set of detection normalization threshold values associated with speaker dependent voiced sound pattern (VSP) detection. In some implementations, a method includes obtaining segment templates characterizing a concurrent segmentation of a first subset of a plurality of vocalization instances of a VSP, each segment template provides a stochastic characterization of how a particular portion of the VSP is vocalized by a particular speaker; generating a noisy segment matrix using a second subset of the plurality of vocalization instances of the VSP, wherein the noisy segment matrix includes one or more noisy copies of segment representations of the second subset; scoring segments from the noisy segment matrix against the segment templates; and determining detection normalization threshold values at two or more known SNR levels for at least one particular noise type based on a function of the scoring.

    Format based speech reconstruction from noisy signals
    3.
    发明授权
    Format based speech reconstruction from noisy signals 有权
    基于噪声信号的基于格式的语音重建

    公开(公告)号:US09020818B2

    公开(公告)日:2015-04-28

    申请号:US13589977

    申请日:2012-08-20

    Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.

    Abstract translation: 本文描述的系统,方法和装置的实现使得能够增强包括在由助听器装置等接收的噪声可听信号中的目标语音信号的可懂度。 特别地,在一些实现中,系统,方法和设备可操作以生成基于机器可读共享器的码本。 在一些实现中,该方法包括确定候选码本元组是否包括足够量的新信息以保证将候选码本元组添加到码本,或者使用候选码本元组的至少一部分来更新现有码本元组 。 附加地和/或替代地,在一些实现中,系统,方法和设备可操作以通过检测可听信号中的共振峰来重建目标语音信号,使用检测到的共振峰来选择码本元组,并且使用所选码本元组中的共振峰信息 重建目标语音信号。

    Spectral comb voice activity detection

    公开(公告)号:US09959886B2

    公开(公告)日:2018-05-01

    申请号:US14099892

    申请日:2013-12-06

    CPC classification number: G10L25/78 G10L2025/783 G10L2025/937

    Abstract: The various implementations described enable voice activity detection and/or pitch estimation for speech signal processing in, for example and without limitation, hearing aids, speech recognition and interpretation software, telephony, and various applications for smartphones and/or wearable devices. In particular, some implementations include systems, methods and/or devices operable to detect voice activity in an audible signal by determining a voice activity indicator value that is a normalized function of signal amplitudes associated with at least two sets of spectral locations associated with a candidate pitch. In some implementations, voice activity is considered detected when the voice activity indicator value breaches a threshold value. Additionally and/or alternatively, in some implementations, analysis of the audible signal provides a pitch estimate of detectable voice activity.

    Phoneme-expert assisted speech recognition and re-synthesis

    公开(公告)号:US09792897B1

    公开(公告)日:2017-10-17

    申请号:US15203758

    申请日:2016-07-06

    Abstract: Various implementations disclosed herein include an expert-assisted phoneme recognition neural network system configured to recognize phonemes within continuous large vocabulary speech sequences without using language specific models (“left-context”), look-ahead (“right-context”) information, or multi-pass sequence processing, and while operating within the resource constraints of low-power and real-time devices. To these ends, in various implementations, an expert-assisted phoneme recognition neural network system as described herein utilizes a-priori phonetic knowledge. Phonetics is concerned with the configuration of the human vocal tract while speaking and acoustic consequences on vocalizations. While similar sounding phonemes are difficult to detect and are frequently misidentified by previously known neural networks, phonetic knowledge gives insight into what aspects of sound acoustics contain the strongest contrast between similar sounding phonemes. Utilizing features that emphasize the respective second formants allows for more robust sound discrimination between these problematic phonemes.

    Voice signal enhancement
    6.
    发明授权
    Voice signal enhancement 有权
    语音信号增强

    公开(公告)号:US09437213B2

    公开(公告)日:2016-09-06

    申请号:US13589954

    申请日:2012-08-20

    Abstract: Implementations include systems, methods and/or devices operable to enhance the intelligibility of a target speech signal by targeted voice model based processing of a noisy audible signal. In some implementations, an amplitude-independent voice proximity function voice model is used to attenuate signal components of a noisy audible signal that are unlikely to be associated with the target speech signal and/or accentuate the target speech signal. In some implementations, the target speech signal is identified as a near-field signal, which is detected by identifying a prominent train of glottal pulses in the noisy audible signal. Subsequently, in some implementations systems, methods and/or devices perform a form of computational auditory scene analysis by converting the noisy audible signal into a set of narrowband time-frequency units, and selectively accentuating the time-frequency units associated with the target speech signal and deemphasizing others using information derived from the identification of the glottal pulse train.

    Abstract translation: 实施方式包括可操作以通过基于目标语音模型处理噪声可听信号来增强目标语音信号的可懂度的系统,方法和/或设备。 在一些实现中,使用幅度无关的语音接近功能语音模型来衰减不可能与目标语音信号相关联的噪声可听信号的信号分量和/或加强目标语音信号。 在一些实现中,目标语音信号被识别为近场信号,其通过在噪声可听信号中识别突出的声门脉冲列来检测。 随后,在一些实现中,系统,方法和/或设备通过将噪声可听信号转换成一组窄带时频单元来执行计算听觉场景分析的形式,并且选择性地加强与目标语音信号相关联的时间 - 频率单位 并且使用从声门脉冲序列的识别得到的信息来强调他人。

    Voice activity detection and pitch estimation
    7.
    发明授权
    Voice activity detection and pitch estimation 有权
    语音活动检测和音调估计

    公开(公告)号:US09384759B2

    公开(公告)日:2016-07-05

    申请号:US13590022

    申请日:2012-08-20

    CPC classification number: G10L25/78 G10L25/18 G10L25/90 G10L25/93

    Abstract: Implementations include systems, methods and/or devices operable to detect voice activity in an audible signal by detecting glottal pulses. The dominant frequency of a series of glottal pulses is perceived as the intonation pattern or melody of natural speech, which is also referred to as the pitch. However, as noted above, spoken communication typically occurs in the presence of noise and/or other interference. In turn, the undulation of voiced speech is masked in some portions of the frequency spectrum associated with human speech by the noise and/or other interference. In some implementations, detection of voice activity is facilitated by dividing the frequency spectrum associated with human speech into multiple sub-bands in order to identify glottal pulses that dominate the noise and/or other inference in particular sub-bands. Additionally and/or alternatively, in some implementations the analysis is furthered to provide a pitch estimate of the detected voice activity.

    Abstract translation: 实现包括可操作以通过检测声门脉冲来检测可听信号中的语音活动的系统,方法和/或设备。 一系列声门脉冲的主频被视为自然语音的语调模式或旋律,也称为音调。 然而,如上所述,语音通信通常在存在噪声和/或其他干扰的情况下发生。 反过来,通过噪声和/或其他干扰,有声语音的波动在与人类语音相关联的频谱的某些部分被屏蔽。 在一些实现中,通过将与人类语音相关联的频谱划分成多个子带来便于语音活动的检测,以便识别支配噪声和/或特别是子带中的其它推断的声门脉冲。 另外和/或替代地,在一些实现中,进一步分析以提供检测到的语音活动的音高估计。

    Directional filtering of audible signals
    8.
    发明授权
    Directional filtering of audible signals 有权
    可听信号的方向滤波

    公开(公告)号:US09241223B2

    公开(公告)日:2016-01-19

    申请号:US14169613

    申请日:2014-01-31

    CPC classification number: H04R25/40 H04R3/005 H04R25/43 H04R2430/23

    Abstract: Various implementations described herein include directional filtering of audible signals, which is provided to enable acoustic isolation and localization of a target voice source. Without limitation, various implementations are suitable for speech signal processing applications in, hearing aids, speech recognition software, voice-command responsive software and devices, telephony, and various other applications associated with mobile and non-mobile systems and devices. In particular, some implementations include systems, methods and/or devices operable to emphasize at least some of the time-frequency components of an audible signal that originate from a target direction and source, and/or deemphasizing at least some of the time-frequency components that originate from one or more other directions or sources. In some implementations, directional filtering includes applying a gain function to audible signal data received from multiple audio sensors. In some implementations, the gain function is determined from the audible signal data and target values associated with directional cues.

    Abstract translation: 本文描述的各种实现方式包括可听信号的方向滤波,其被提供用于使得目标语音源的声学隔离和定位。 不限于此,各种实施方案适用于语音信号处理应用,助听器,语音识别软件,语音命令响应软件和设备,电话以及与移动和非移动系统和设备相关联的各种其它应用。 特别地,一些实施方式包括可操作以强调来自目标方向和源的可听信号的至少一些时频分量的系统,方法和/或设备,和/或不强调至少某些时频 来自一个或多个其他方向或来源的组件。 在一些实施方式中,方向滤波包括将增益函数应用于从多个音频传感器接收的可听信号数据。 在一些实现中,从可听信号数据和与方向提示相关联的目标值确定增益函数。

    Phonotactic-based speech recognition and re-synthesis

    公开(公告)号:US10297247B2

    公开(公告)日:2019-05-21

    申请号:US15249457

    申请日:2016-08-28

    Abstract: Various implementations disclosed herein include a phonotactic post-processor configured to rescore the N-best phoneme candidates output by a primary ensemble phoneme neural network using a priori phonotactic information. In various implementations, one of the scored set of the N-best phoneme candidates is selected as a preferred estimate for a one-phoneme output decision by the phonotactic post-processor. In some implementations, the one-phoneme output decision is an estimate of the most likely detected and recognized phoneme in a frame based on a function of posterior probabilities generated by an ensemble phoneme neural network, as well as phonotactic information and statistical performance characterizations incorporated by the phonotactic post-processor. More specifically, in various implementations, a phonotactic post-processor as described herein utilizes a priori known patterns of phonotactic structure representative of higher-level linguistic structure, instead of configuring the system to learn to recognize the higher-level linguistic structure a posteriori.

Patent Agency Ranking