Automatic pattern recognition using category dependent feature selection
    1.
    发明授权
    Automatic pattern recognition using category dependent feature selection 失效
    使用分类依赖特征选择的自动模式识别

    公开(公告)号:US08380506B2

    公开(公告)日:2013-02-19

    申请号:US11998262

    申请日:2007-11-29

    IPC分类号: G10L15/04

    CPC分类号: G06K9/623 G10L15/142

    摘要: Disclosed are apparatus and methods that employ a modified version of a computational model of the human peripheral and central auditory system, and that provide for automatic pattern recognition using category dependent feature selection. The validity of the output of the model is examined by deriving feature vectors from the dimension expanded cortical response of the central auditory system for use in a conventional phoneme recognition task. In addition, the cortical response may be a place-coded data set where sounds are categorized according to the regions containing their most distinguishing features. This provides for a novel category-dependent feature selection apparatus and methods in which this mechanism may be utilized to better simulate robust human pattern (speech) recognition.

    摘要翻译: 公开了使用人类外围和中央听觉系统的计算模型的修改版本的装置和方法,并且使用类别依赖特征选择来提供自动模式识别。 通过从用于常规音素识别任务的中央听觉系统的尺寸扩大皮层响应的导出特征向量来检查模型的输出的有效性。 此外,皮层响应可以是地方编码的数据集,其中声音根据包含其最显着特征的区域被分类。 这提供了一种新颖的类别依赖特征选择装置和方法,其中该机制可以用于更好地模拟鲁棒的人类模式(语音)识别。

    Method of key-phase detection and verification for flexible speech
understanding
    2.
    发明授权
    Method of key-phase detection and verification for flexible speech understanding 失效
    灵活语音理解的关键相位检测和验证方法

    公开(公告)号:US5797123A

    公开(公告)日:1998-08-18

    申请号:US771732

    申请日:1996-12-20

    摘要: A key-phrase detection and verification method that can be advantageously used to realize understanding of flexible (i.e., unconstrained) speech. A "multiple pass" procedure is applied to a spoken utterance comprising a sequence of words (i.e., a "sentence"). First, a plurality of key-phrases are detected (i.e., recognized) based on a set of phrase sub-grammars which may, for example, be specific to the state of the dialogue. These key-phrases are then verified by assigning confidence measures thereto and comparing these confidence measures to a threshold, resulting in a set of verified key-phrase candidates. Next, the verified key-phrase candidates are connected into sentence hypotheses based upon the confidence measures and predetermined (e.g., task-specific) semantic information. And, finally, one or more of these sentence hypotheses are verified to produce a verified sentence hypothesis and, from that, a resultant understanding of the spoken utterance.

    摘要翻译: 可以有利地用于实现对柔性(即,无约束)语音的理解的密钥短语检测和验证方法。 对包括一系列单词(即“句子”)的语音话语应用“多重通行证”程序。 首先,基于短语子语法的集合来检测(即,识别)多个关键短语,其可以例如特定于对话状态。 然后通过分配置信度来验证这些密钥短语,并将这些置信度量度与阈值进行比较,得到一组已验证的密钥短语候选。 接下来,基于置信度度量和预定(例如,任务特定)语义信息将经验证的关键词候选者连接成句子假设。 最后,验证这些句子假说中的一个或多个,以产生一个经过验证的句子假设,从而得到对口头发音的理解。

    Signal bias removal for robust telephone speech recognition
    3.
    发明授权
    Signal bias removal for robust telephone speech recognition 失效
    强大的电话语音识别信号偏移去除

    公开(公告)号:US5590242A

    公开(公告)日:1996-12-31

    申请号:US217035

    申请日:1994-03-24

    摘要: A signal bias removal (SBR) method based on the maximum likelihood estimation of the bias for minimizing undesirable effects in speech recognition systems is described. The technique is readily applicable in various architectures including discrete (vector-quantization based), semicontinuous and continuous-density Hidden Markov Model (HMM) systems. For example, the SBR method can be integrated into a discrete density HMM and applied to telephone speech recognition where the contamination due to extraneous signal components is unknown. To enable real-time implementation, a sequential method for the estimation of the bias (SSBR) is disclosed.

    摘要翻译: 描述了基于用于最小化语音识别系统中的不良影响的偏差的最大似然估计的信号偏移去除(SBR)方法。 该技术易于应用于各种架构,包括离散(矢量量化),半连续和连续密度隐马尔可夫模型(HMM)系统。 例如,SBR方法可以集成到离散密度HMM中,并应用于电话语音识别,其中由于外部信号分量引起的污染是未知的。 为了实现实时性,公开了用于估计偏差(SSBR)的顺序方法。

    Recognition unit model training based on competing word and word string
models
    4.
    发明授权
    Recognition unit model training based on competing word and word string models 失效
    基于竞争词和字串模型的识别单元模型训练

    公开(公告)号:US5579436A

    公开(公告)日:1996-11-26

    申请号:US30895

    申请日:1993-03-15

    CPC分类号: G10L15/063 G10L15/144

    摘要: A system pattern-based speech recognition, e.g., a hidden Markov model (HMM) based speech recognizer using Viterbi scoring. The principle of minimum recognition error rate is applied by the present invention using discriminative training. Various issues related to the special structure of HMMs are presented. Parameter update expressions for HMMs are provided.

    摘要翻译: 基于系统模式的语音识别,例如使用维特比计分的基于隐马尔可夫模型(HMM)的语音识别器。 本发明使用区分性训练来应用最小识别错误率的原理。 介绍了与HMM特殊结构有关的各种问题。 提供HMM的参数更新表达式。

    Secure voice transmission
    5.
    发明授权
    Secure voice transmission 失效
    安全的语音传输

    公开(公告)号:US4612414A

    公开(公告)日:1986-09-16

    申请号:US527962

    申请日:1983-08-31

    申请人: Biing-Hwang Juang

    发明人: Biing-Hwang Juang

    CPC分类号: H04K1/00

    摘要: Voice signals are transmitted over a voiceband telephone channel with a high degree of security and good voice quality by applying to the transmission channel a first signal which includes digital information derived from the vocal tract response of the signal and a second signal which includes continuous information derived from the excitation component of the voice signal.

    摘要翻译: 语音信号通过对传输信道应用包括从信号的声道响应导出的数字信息的第一信号和包括连续信息导出的第二信号,以高安全性和良好语音质量通过语音频带电话信道发送 从声音信号的激励分量。

    Adaptive decision directed speech recognition bias equalization method
and apparatus
    6.
    发明授权
    Adaptive decision directed speech recognition bias equalization method and apparatus 失效
    自适应决策导向语音识别偏差均衡方法和装置

    公开(公告)号:US5812972A

    公开(公告)日:1998-09-22

    申请号:US366657

    申请日:1994-12-30

    摘要: The present invention provides a speech recognizer that creates and updates the equalization vector as input speech is provided to the recognizer. The present invention includes a speech analyzer which transforms an input speech signal into a series of feature vectors or observation sequence. Each feature vector is then provided to a speech recognizer which modifies the feature vector by subtracting a previously determined equalization vector therefrom. The recognizer then performs segmentation and matches the modified feature vector to a stored model vector which is defined as the segmentation vector. The recognizer then, from time to time, determines a new equalization vector, the new equalization vector being defined based on the difference between one or more input feature vectors and their respective segmentation vectors. The new equalization vector may then be used either for performing another segmentation iteration on the same observation sequence or for performing segmentation on subsequent feature vectors.

    摘要翻译: 本发明提供了一种语音识别器,其在输入语音被提供给识别器时创建和更新均衡矢量。 本发明包括将输入语音信号变换为一系列特征向量或观察序列的语音分析器。 然后将每个特征向量提供给语音识别器,语音识别器通过从其减去预先确定的均衡矢量来修改特征向量。 识别器然后执行分割并将修改的特征向量与被定义为分割向量的存储的模型向量进行匹配。 识别器然后不时地确定新的均衡矢量,新的均衡矢量是基于一个或多个输入特征矢量与它们各自的分割矢量之间的差定义的。 然后可以将新的均衡矢量用于在相同观察序列上执行另一分割迭代,或用于对后续特征向量执行分割。

    Speaker verification with cohort normalized scoring
    7.
    发明授权
    Speaker verification with cohort normalized scoring 失效
    演讲者验证与队列归一化得分

    公开(公告)号:US5675704A

    公开(公告)日:1997-10-07

    申请号:US638401

    申请日:1996-04-26

    摘要: A facility is provided for allowing a caller to place a telephone call by merely uttering a label identifying a desired called destination and to charge the telephone call to a particular billing account by merely uttering a label identifying that account. Alternatively, the caller may place the call by dialing or uttering the telephone number of the called destination or by entering a speed dial code associated with that telephone number. The facility includes a speaker verification system which employs cohort normalized scoring. Cohort normalized scoring provides a dynamic threshold for the verification process making the process more robust to variation in training and verification utterences. Such variation may be caused by, e.g., changes in communication channel characteristics or speaker loudness level.

    摘要翻译: 提供了一种设施,用于允许呼叫者通过仅仅发出标识期望的被叫目的地的标签来进行电话呼叫,并通过仅仅发出标识该帐户的标签来将电话呼叫收费到特定的记帐帐户。 或者,呼叫者可以通过拨打或说出被叫目的地的电话号码或通过输入与该电话号码相关联的快速拨号代码来进行呼叫。 该设施包括使用队列归一化得分的扬声器验证系统。 队列归一化得分为验证过程提供了动态门槛,使得过程对培训和验证发现的变化更加鲁棒。 这种变化可以由例如通信信道特性或扬声器响度水平的变化引起。

    Content interpolating web proxy server
    8.
    发明授权
    Content interpolating web proxy server 有权
    内容内插Web代理服务器

    公开(公告)号:US08135860B1

    公开(公告)日:2012-03-13

    申请号:US09620495

    申请日:2000-07-20

    IPC分类号: G06F15/16

    摘要: A content interpolating web proxy server is configured in a computer network for processing retrieved web content so as to place it in a format suitable for presentation on a particular client device such as, e.g., a computer, personal digital assistant (PDA), wireless telephone or voice browser-equipped device. The server processes a client request generated by a client device to determine a particular client type associated with the client device, retrieves web content identified in the client request, retrieves one or more augmentation files associated with the web content and the particular client type, and alters the retrieved web content in accordance with the one or more augmentation files. The altered web content is then delivered to the client device. The one or more augmentation files may be co-located with the web content at a site remote from the proxy server, such that the content owner need not own, maintain or otherwise control the proxy server.

    摘要翻译: 内容内插web代理服务器被配置在计算机网络中,用于处理检索到的web内容,以便将其放置成适合于在诸如计算机,个人数字助理(PDA),无线电话等特定客户端设备上呈现的格式 或配备语音浏览器的设备。 服务器处理由客户端设备生成的客户端请求,以确定与客户端设备相关联的特定客户端类型,检索客户端请求中标识的Web内容,检索与Web内容和特定客户端类型相关联的一个或多个增强文件,以及 根据一个或多个增强文件改变检索到的网页内容。 然后将改变后的网页内容传送到客户端设备。 一个或多个扩充文件可以与远离代理服务器的站点处的web内容共处于一起,使得内容所有者不需要拥有,维护或以其他方式控制代理服务器。

    Source coding and transmission with time diversity
    9.
    发明授权
    Source coding and transmission with time diversity 有权
    源编码和时间分集传输

    公开(公告)号:US06715125B1

    公开(公告)日:2004-03-30

    申请号:US09420277

    申请日:1999-10-18

    申请人: Biing-Hwang Juang

    发明人: Biing-Hwang Juang

    IPC分类号: G11B2700

    摘要: A repetitive transmission technique with time diversity which provides improved signal-to-noise ratio (SNR) in the presence of packet loss. Time shifts are introduced between N versions of a particular block of information to be transmitted, and the time-shifted versions are encoded in a set of N encoders and transmitted as N packets. The time shift introduced between a given pair of the N versions corresponds to approximately 1/N of the time duration of a particular one of the versions. The SNR of a composite reconstructed signal generated from the N packets with the introduced time shift in a receiver of the system is approximately the same as would be obtained using a set of N independent encoders to generate the plurality of packets without the introduced time shifts. The gain in the SNR of the composite reconstructed signal attributable to the introduction of the time shifts is 10 log10N′, where N′=1, . . . N is the total number of the N packets actually received at the system receiver. A further improvement in SNR performance may be obtained by introducing quantization error compensation, in which quantization error from the encoding of a given one of the versions is successively combined with subsequent versions prior to encoding of those versions.

    摘要翻译: 具有时间分集的重复传输技术,在存在分组丢失的情况下提供改进的信噪比(SNR)。 在要发送的特定信息块的N个版本之间引入时移,并且时移版本被编码在一组N个编码器中并作为N个包发送。 在N个版本的给定对之间引入的时间偏移对应于特定一个版本的持续时间的大约1 / N。 在系统的接收机中引入时移的N个分组产生的复合重建信号的SNR与使用一组N个独立编码器在不引入时移的情况下生成多个分组将获得的SNR近似相同。 归因于引入时移的复合重建信号的SNR增益为10 log10N',其中N'= 1。 。 。 N是在系统接收机实际接收的N个分组的总数。 可以通过引入量化误差补偿来获得SNR性能的进一步改进,其中来自编码给定版本的量化误差在这些版本编码之前与后续版本相继组合。

    Systems, methods and articles of manufacture for performing high
resolution N-best string hypothesization
    10.
    发明授权
    Systems, methods and articles of manufacture for performing high resolution N-best string hypothesization 失效
    用于执行高分辨率N最佳字符串假设的系统,方法和制造

    公开(公告)号:US5805772A

    公开(公告)日:1998-09-08

    申请号:US366843

    申请日:1994-12-30

    CPC分类号: G10L15/08 G10L15/197

    摘要: Disclosed are systems, methods and articles of manufacture for performing high resolution N-best string hypothesization during speech recognition. A received input signal, representing a speech utterance, is processed utilizing a plurality of recognition models to generate one or more string hypotheses of the received input signal. The plurality of recognition models preferably include one or more inter-word context dependent models and one or more language models. A forward partial path map is produced according to the allophonic specifications of at least one of the inter-word context dependent models and the language models. The forward partial path map is traversed in the backward direction as a function of the allophonic specifications to generate the one or more string hypotheses. One or more of the recognition models may represent one phone words.

    摘要翻译: 公开了用于在语音识别期间执行高分辨率N最佳字符串假设的系统,方法和制品。 使用多个识别模型来处理表示语音话音的接收到的输入信号,以产生所接收的输入信号的一个或多个字符串假设。 多个识别模型优选地包括一个或多个词间上下文相关模型和一个或多个语言模型。 根据至少一个字词上下文相关模型和语言模型的等式规范来产生前向部分路径图。 作为生成一个或多个字符串假设的等式规范的函数,向前方向遍历前向部分路径图。 一个或多个识别模型可以表示一个电话词。