-
公开(公告)号:US10242677B2
公开(公告)日:2019-03-26
申请号:US14835192
申请日:2015-08-25
Applicant: Malaspina Labs (Barbados), Inc.
Inventor: Alexander Escott
Abstract: Various implementations disclosed herein include a training module configured to determining a set of detection normalization threshold values associated with speaker dependent voiced sound pattern (VSP) detection. In some implementations, a method includes obtaining segment templates characterizing a concurrent segmentation of a first subset of a plurality of vocalization instances of a VSP, each segment template provides a stochastic characterization of how a particular portion of the VSP is vocalized by a particular speaker; generating a noisy segment matrix using a second subset of the plurality of vocalization instances of the VSP, wherein the noisy segment matrix includes one or more noisy copies of segment representations of the second subset; scoring segments from the noisy segment matrix against the segment templates; and determining detection normalization threshold values at two or more known SNR levels for at least one particular noise type based on a function of the scoring.
-
公开(公告)号:US09953633B2
公开(公告)日:2018-04-24
申请号:US14806736
申请日:2015-07-23
Applicant: Malaspina Labs (Barbados), Inc.
Inventor: Clarence Chu , Alireza Kenarsari Anhari
CPC classification number: G10L15/06 , G10L15/02 , G10L15/04 , G10L15/12 , G10L15/20 , G10L2015/081 , G10L2015/088
Abstract: Various implementations disclosed herein include a training module configured to produce a set of segment templates from a concurrent segmentation of a plurality of vocalization instances of a VSP vocalized by a particular speaker, who is identifiable by a corresponding set of vocal characteristics. Each segment template provides a stochastic characterization of how each of one or more portions of a VSP is vocalized by the particular speaker in accordance with the corresponding set of vocal characteristics. Additionally, in various implementations, the training module includes systems, methods and/or devices configured to produce a set of VSP segment maps that each provide a quantitative characterization of how respective segments of the plurality of vocalization instances vary in relation to a corresponding one of a set of segment templates.
-
3.
公开(公告)号:US09020818B2
公开(公告)日:2015-04-28
申请号:US13589977
申请日:2012-08-20
Applicant: Pierre Zakarauskas , Alexander Escott , Clarence S. H. Chu , Shawn E. Stevenson
Inventor: Pierre Zakarauskas , Alexander Escott , Clarence S. H. Chu , Shawn E. Stevenson
CPC classification number: G10L19/012 , G10L19/0017 , G10L21/02 , G10L25/15 , G10L25/75 , G10L2019/0007 , H04R25/00
Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.
Abstract translation: 本文描述的系统,方法和装置的实现使得能够增强包括在由助听器装置等接收的噪声可听信号中的目标语音信号的可懂度。 特别地,在一些实现中,系统,方法和设备可操作以生成基于机器可读共享器的码本。 在一些实现中,该方法包括确定候选码本元组是否包括足够量的新信息以保证将候选码本元组添加到码本,或者使用候选码本元组的至少一部分来更新现有码本元组 。 附加地和/或替代地,在一些实现中,系统,方法和设备可操作以通过检测可听信号中的共振峰来重建目标语音信号,使用检测到的共振峰来选择码本元组,并且使用所选码本元组中的共振峰信息 重建目标语音信号。
-
公开(公告)号:US09959886B2
公开(公告)日:2018-05-01
申请号:US14099892
申请日:2013-12-06
Applicant: Malaspina Labs (Barbados), Inc.
Inventor: Alireza Kenarsari Anhari , Alexander Escott , Pierre Zakarauskas
CPC classification number: G10L25/78 , G10L2025/783 , G10L2025/937
Abstract: The various implementations described enable voice activity detection and/or pitch estimation for speech signal processing in, for example and without limitation, hearing aids, speech recognition and interpretation software, telephony, and various applications for smartphones and/or wearable devices. In particular, some implementations include systems, methods and/or devices operable to detect voice activity in an audible signal by determining a voice activity indicator value that is a normalized function of signal amplitudes associated with at least two sets of spectral locations associated with a candidate pitch. In some implementations, voice activity is considered detected when the voice activity indicator value breaches a threshold value. Additionally and/or alternatively, in some implementations, analysis of the audible signal provides a pitch estimate of detectable voice activity.
-
公开(公告)号:US09792897B1
公开(公告)日:2017-10-17
申请号:US15203758
申请日:2016-07-06
Applicant: Malaspina Labs (Barbados), Inc.
Inventor: Saeed Mosayyebpour Kaskari , Aanchan Kumar Mohan , Michael David Fry , Dean Wolfgang Neumann
CPC classification number: G10L15/063 , G10L15/02 , G10L15/16 , G10L15/197 , G10L15/20 , G10L2015/025 , G10L2015/0635
Abstract: Various implementations disclosed herein include an expert-assisted phoneme recognition neural network system configured to recognize phonemes within continuous large vocabulary speech sequences without using language specific models (“left-context”), look-ahead (“right-context”) information, or multi-pass sequence processing, and while operating within the resource constraints of low-power and real-time devices. To these ends, in various implementations, an expert-assisted phoneme recognition neural network system as described herein utilizes a-priori phonetic knowledge. Phonetics is concerned with the configuration of the human vocal tract while speaking and acoustic consequences on vocalizations. While similar sounding phonemes are difficult to detect and are frequently misidentified by previously known neural networks, phonetic knowledge gives insight into what aspects of sound acoustics contain the strongest contrast between similar sounding phonemes. Utilizing features that emphasize the respective second formants allows for more robust sound discrimination between these problematic phonemes.
-
公开(公告)号:US09437213B2
公开(公告)日:2016-09-06
申请号:US13589954
申请日:2012-08-20
Applicant: Pierre Zakarauskas , Alexander Escott , Clarence S. H. Chu , Shawn E. Stevenson
Inventor: Pierre Zakarauskas , Alexander Escott , Clarence S. H. Chu , Shawn E. Stevenson
IPC: G10L19/14 , G10L21/0324 , G10L21/0208 , G10L21/0308 , G10L21/0364
CPC classification number: G10L21/0324 , G10L21/0208 , G10L21/0308 , G10L21/0364 , G10L2021/02082
Abstract: Implementations include systems, methods and/or devices operable to enhance the intelligibility of a target speech signal by targeted voice model based processing of a noisy audible signal. In some implementations, an amplitude-independent voice proximity function voice model is used to attenuate signal components of a noisy audible signal that are unlikely to be associated with the target speech signal and/or accentuate the target speech signal. In some implementations, the target speech signal is identified as a near-field signal, which is detected by identifying a prominent train of glottal pulses in the noisy audible signal. Subsequently, in some implementations systems, methods and/or devices perform a form of computational auditory scene analysis by converting the noisy audible signal into a set of narrowband time-frequency units, and selectively accentuating the time-frequency units associated with the target speech signal and deemphasizing others using information derived from the identification of the glottal pulse train.
Abstract translation: 实施方式包括可操作以通过基于目标语音模型处理噪声可听信号来增强目标语音信号的可懂度的系统,方法和/或设备。 在一些实现中,使用幅度无关的语音接近功能语音模型来衰减不可能与目标语音信号相关联的噪声可听信号的信号分量和/或加强目标语音信号。 在一些实现中,目标语音信号被识别为近场信号,其通过在噪声可听信号中识别突出的声门脉冲列来检测。 随后,在一些实现中,系统,方法和/或设备通过将噪声可听信号转换成一组窄带时频单元来执行计算听觉场景分析的形式,并且选择性地加强与目标语音信号相关联的时间 - 频率单位 并且使用从声门脉冲序列的识别得到的信息来强调他人。
-
公开(公告)号:US09384759B2
公开(公告)日:2016-07-05
申请号:US13590022
申请日:2012-08-20
Applicant: Pierre Zakarauskas , Alexander Escott , Clarence S. H. Chu , Shawn E. Stevenson
Inventor: Pierre Zakarauskas , Alexander Escott , Clarence S. H. Chu , Shawn E. Stevenson
Abstract: Implementations include systems, methods and/or devices operable to detect voice activity in an audible signal by detecting glottal pulses. The dominant frequency of a series of glottal pulses is perceived as the intonation pattern or melody of natural speech, which is also referred to as the pitch. However, as noted above, spoken communication typically occurs in the presence of noise and/or other interference. In turn, the undulation of voiced speech is masked in some portions of the frequency spectrum associated with human speech by the noise and/or other interference. In some implementations, detection of voice activity is facilitated by dividing the frequency spectrum associated with human speech into multiple sub-bands in order to identify glottal pulses that dominate the noise and/or other inference in particular sub-bands. Additionally and/or alternatively, in some implementations the analysis is furthered to provide a pitch estimate of the detected voice activity.
Abstract translation: 实现包括可操作以通过检测声门脉冲来检测可听信号中的语音活动的系统,方法和/或设备。 一系列声门脉冲的主频被视为自然语音的语调模式或旋律,也称为音调。 然而,如上所述,语音通信通常在存在噪声和/或其他干扰的情况下发生。 反过来,通过噪声和/或其他干扰,有声语音的波动在与人类语音相关联的频谱的某些部分被屏蔽。 在一些实现中,通过将与人类语音相关联的频谱划分成多个子带来便于语音活动的检测,以便识别支配噪声和/或特别是子带中的其它推断的声门脉冲。 另外和/或替代地,在一些实现中,进一步分析以提供检测到的语音活动的音高估计。
-
公开(公告)号:US09241223B2
公开(公告)日:2016-01-19
申请号:US14169613
申请日:2014-01-31
Applicant: Malaspina Labs (Barbados), Inc.
Inventor: Clarence S. H. Chu , Alireza Kenarsari Anhari , Alexander Escott , Shawn E. Stevenson , Pierre Zakarauskas
CPC classification number: H04R25/40 , H04R3/005 , H04R25/43 , H04R2430/23
Abstract: Various implementations described herein include directional filtering of audible signals, which is provided to enable acoustic isolation and localization of a target voice source. Without limitation, various implementations are suitable for speech signal processing applications in, hearing aids, speech recognition software, voice-command responsive software and devices, telephony, and various other applications associated with mobile and non-mobile systems and devices. In particular, some implementations include systems, methods and/or devices operable to emphasize at least some of the time-frequency components of an audible signal that originate from a target direction and source, and/or deemphasizing at least some of the time-frequency components that originate from one or more other directions or sources. In some implementations, directional filtering includes applying a gain function to audible signal data received from multiple audio sensors. In some implementations, the gain function is determined from the audible signal data and target values associated with directional cues.
Abstract translation: 本文描述的各种实现方式包括可听信号的方向滤波,其被提供用于使得目标语音源的声学隔离和定位。 不限于此,各种实施方案适用于语音信号处理应用,助听器,语音识别软件,语音命令响应软件和设备,电话以及与移动和非移动系统和设备相关联的各种其它应用。 特别地,一些实施方式包括可操作以强调来自目标方向和源的可听信号的至少一些时频分量的系统,方法和/或设备,和/或不强调至少某些时频 来自一个或多个其他方向或来源的组件。 在一些实施方式中,方向滤波包括将增益函数应用于从多个音频传感器接收的可听信号数据。 在一些实现中,从可听信号数据和与方向提示相关联的目标值确定增益函数。
-
公开(公告)号:US09240190B2
公开(公告)日:2016-01-19
申请号:US14659099
申请日:2015-03-16
Applicant: Malaspina Labs (Barbados), Inc.
Inventor: Pierre Zakarauskas , Alexander Escott , Clarence S. H. Chu , Shawn E. Stevenson
IPC: G10L15/00 , G10L15/14 , G10L15/26 , G10L21/00 , G10L19/012 , G10L21/02 , G10L19/00 , G10L25/75 , H04R25/00 , G10L25/15
CPC classification number: G10L19/012 , G10L19/0017 , G10L21/02 , G10L25/15 , G10L25/75 , G10L2019/0007 , H04R25/00
Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.
-
公开(公告)号:US10297247B2
公开(公告)日:2019-05-21
申请号:US15249457
申请日:2016-08-28
Applicant: Malaspina Labs (Barbados), Inc.
Inventor: Robert Alex Fuhrman
Abstract: Various implementations disclosed herein include a phonotactic post-processor configured to rescore the N-best phoneme candidates output by a primary ensemble phoneme neural network using a priori phonotactic information. In various implementations, one of the scored set of the N-best phoneme candidates is selected as a preferred estimate for a one-phoneme output decision by the phonotactic post-processor. In some implementations, the one-phoneme output decision is an estimate of the most likely detected and recognized phoneme in a frame based on a function of posterior probabilities generated by an ensemble phoneme neural network, as well as phonotactic information and statistical performance characterizations incorporated by the phonotactic post-processor. More specifically, in various implementations, a phonotactic post-processor as described herein utilizes a priori known patterns of phonotactic structure representative of higher-level linguistic structure, instead of configuring the system to learn to recognize the higher-level linguistic structure a posteriori.
-
-
-
-
-
-
-
-
-