Voice recognition system method and apparatus
    1.
    发明授权
    Voice recognition system method and apparatus 有权
    语音识别系统的方法和装置

    公开(公告)号:US06941265B2

    公开(公告)日:2005-09-06

    申请号:US10017270

    申请日:2001-12-14

    IPC分类号: G10L15/28 G10L15/00

    CPC分类号: G10L15/28

    摘要: Generally stated a method and an accompanying apparatus provides for a voice recognition system (300) with programmable front end processing unit (400). The front end processing unit (400) requests and receives different configuration files at different times for processing voice data in the voice recognition system (300). The configuration files are communicated to the front end unit via a communication link (310) for configuring the front end processing unit (400). A microprocessor may provide the front end configuration files on the communication link at different times.

    摘要翻译: 通常所述方法和伴随装置提供具有可编程前端处理单元(400)的语音识别系统(300)。 前端处理单元400在不同时间请求并接收不同的配置文件,以处理语音识别系统(300)中的语音数据。 配置文件经由用于配置前端处理单元(400)的通信链路(310)传送到前端单元。 微处理器可以在不同时间在通信链路上提供前端配置文件。

    Voice recognition rejection scheme
    3.
    发明授权
    Voice recognition rejection scheme 有权
    语音识别拒绝方案

    公开(公告)号:US06574596B2

    公开(公告)日:2003-06-03

    申请号:US09248513

    申请日:1999-02-08

    IPC分类号: G10L1504

    CPC分类号: G10L15/10 G10L15/22

    摘要: A voice recognition rejection scheme for capturing an utterance includes the steps accepting the utterance, applying an N-best algorithm to the utterance, or rejecting the utterance. The utterance is accepted if a first predefined relationship exists between one or more closest comparison results for the utterance with respect to a stored word and one or more differences between the one or more closest comparison results and one or more other comparison results between the utterance and one or more other stored words. An N-best algorithm is applied to the utterance if a second predefined relationship exists between the one or more closest comparison results and the one or more differences between the one or more closest comparison results and the one or more other comparison results. The utterance is rejected if a third predefined relationship exists between the one or more closest comparison results and the one or more differences between the one or more closest comparison results and the one or more other comparison results. One of the one or more other comparison results may advantageously be a next-closest comparison result for the utterance and another store word. The first, second, and third predefined relationships may advantageously be linear relationships.

    摘要翻译: 用于捕获话语的语音识别拒绝方案包括接受发音的步骤,将N最佳算法应用于话语或拒绝话语。 如果在一个或多个最接近的比较结果之间存在关于存储的单词的一个或多个最接近的比较结果与一个或多个最接近的比较结果之间的一个或多个差异以及话语和语音的一个或多个其他比较结果之间存在第一预定义关系, 一个或多个其他存储的字。 如果在一个或多个最接近的比较结果与一个或多个最接近的比较结果与一个或多个其他比较结果之间的一个或多个差异存在第二预定关系,那么将N最佳算法应用于话语。 如果一个或多个最接近的比较结果与一个或多个最接近的比较结果与一个或多个其它比较结果之间的一个或多个差异存在第三预定关系,那么话语被拒绝。 一个或多个其它比较结果中的一个可以有利地是用于话语和另一个存储词的下一个最接近的比较结果。 第一,第二和第三预定关系可以有利地是线性关系。

    Voice recognition user interface for telephone handsets
    4.
    发明授权
    Voice recognition user interface for telephone handsets 有权
    语音识别用户界面,用于电话手机

    公开(公告)号:US06449496B1

    公开(公告)日:2002-09-10

    申请号:US09246499

    申请日:1999-02-08

    IPC分类号: H04B138

    CPC分类号: H04M1/271

    摘要: A method and apparatus providing a user interface within a phone that responds to a limited vocabulary of user trained voice commands. The interface allows users to perform all phone handset dialing functions using voice commands. Additionally, users will be able to create and modify entries within a voice recognition phonebook, whereby a number within the voice recognition phonebook can be called by saying the name associated with the number. The user interface provides a combination of voice and LCD displayed user prompts and responses to voice input. The interface responds to user voice commands and performs the command functions based upon matches to previously user trained voice command vocabulary words stored in memory.

    摘要翻译: 一种在电话内提供用户界面的方法和装置,其响应于用户训练的语音命令的有限词汇。 该接口允许用户使用语音命令执行所有手机拨号功能。 此外,用户将能够创建和修改语音识别电话簿内的条目,由此可以通过说出与该号码相关联的名称来呼叫语音识别电话簿内的号码。 用户界面提供语音和LCD组合,显示用户提示和响应语音输入。 接口响应用户语音命令,并且基于与存储在存储器中的先前用户训练的语音命令词汇词的匹配来执行命令功能。

    System and method for segmentation and recognition of speech signals
    5.
    发明授权
    System and method for segmentation and recognition of speech signals 有权
    用于语音信号的分割和识别的系统和方法

    公开(公告)号:US06278972B1

    公开(公告)日:2001-08-21

    申请号:US09225891

    申请日:1999-01-04

    IPC分类号: G01L1504

    CPC分类号: G10L15/04

    摘要: A system and method for forming a segmented speech signal from an input speech signal having a plurality of frames. The input speech signal is converted from a time domain signal to a frequency domain signal having a plurality of speech frames, wherein each speech frame in the frequency domain signal is represented by at least one spectral value associated with the speech frame. A spectral difference value is then determined for each pair of adjacent frames in the frequency domain signal, wherein the spectral difference value for each pair of adjacent frames is representative of a difference between the at least one spectral value associated with each frame in the pair of adjacent frames. An initial cluster boundary is set between each pair of adjacent frames in the frequency domain signal, and a variance value is assigned to each cluster in the frequency domain signal, wherein the variance value for each cluster is equal to one of the determined spectral difference values. Next, a plurality of cluster merge parameters is calculated, wherein each of the cluster merge parameters is associated with a pair of adjacent clusters in the frequency domain signal. A minimum cluster merge parameter is selected from the plurality of cluster merge parameters. A merged cluster is then formed by canceling a cluster boundary between the clusters associated with the minimum merge parameter and assigning a merged variance value to the merged cluster, wherein the merged variance value is representative of the variance values assigned to the clusters associated with the minimum merge parameter. The process is repeated in order to form a plurality of merged clusters, and the segmented speech signal is formed in accordance with the plurality of merged clusters.

    摘要翻译: 一种用于从具有多个帧的输入语音信号形成分段语音信号的系统和方法。 输入语音信号从时域信号转换为具有多个语音帧的频域信号,其中频域信号中的每个语音帧由与语音帧相关联的至少一个频谱值表示。 然后对频域信号中的每对相邻帧确定频谱差值,其中每对相邻帧的频谱差值表示与该对相邻帧中的每个帧相关联的至少一个频谱值之间的差异 相邻帧。 在频域信号中的每对相邻帧之间设置初始簇边界,并且将频域值分配给频域信号中的每个簇,其中每个簇的方差值等于所确定的光谱差值之一 。 接下来,计算多个集群合并参数,其中每个集群合并参数与频域信号中的一对相邻集群相关联。 从多个集群合并参数中选择最小集群合并参数。 然后通过消除与最小合并参数相关联的集群之间的集群边界并将合并的方差值分配给合并的集群来形成合并的集群,其中合并的方差值表示分配给与最小合并参数相关联的集群的方差值 合并参数。 重复该过程以形成多个合并的群集,并且根据多个合并的群集形成分段语音信号。

    Zero disparity plane for feedback-based three-dimensional video
    6.
    发明授权
    Zero disparity plane for feedback-based three-dimensional video 有权
    用于基于反馈的三维视频的零视差平面

    公开(公告)号:US09049423B2

    公开(公告)日:2015-06-02

    申请号:US12958107

    申请日:2010-12-01

    摘要: The techniques of this disclosure are directed to the feedback-based stereoscopic display of three-dimensional images, such as may be used for video telephony (VT) and human-machine interface (HMI) application. According to one example, a region of interest (ROI) of stereoscopically captured images may be automatically determined based on determining disparity for at least one pixel of the captured images are described herein. According to another example, a zero disparity plane (ZDP) for the presentation of a 3D representation of stereoscopically captured images may be determined based on an identified ROI. According to this example, the ROI may be automatically identified, or identified based on receipt of user input identifying the ROI.

    摘要翻译: 本公开的技术涉及三维图像的基于反馈的立体显示,诸如可用于视频电话(VT)和人机界面(HMI)应用。 根据一个示例,可以基于确定捕获图像的至少一个像素的视差来自动确定立体拍摄图像的感兴趣区域(ROI)。 根据另一示例,可以基于所识别的ROI来确定用于呈现立体摄影图像的3D表示的零视差平面(ZDP)。 根据该示例,可以基于接收到识别ROI的用户输入来自动识别或识别ROI。

    ZERO DISPARITY PLANE FOR FEEDBACK-BASED THREE-DIMENSIONAL VIDEO
    7.
    发明申请
    ZERO DISPARITY PLANE FOR FEEDBACK-BASED THREE-DIMENSIONAL VIDEO 有权
    用于基于反馈的三维视频的零偏差平面

    公开(公告)号:US20120140038A1

    公开(公告)日:2012-06-07

    申请号:US12958107

    申请日:2010-12-01

    IPC分类号: H04N13/02 G06K9/00

    摘要: The techniques of this disclosure are directed to the feedback-based stereoscopic display of three-dimensional images, such as may be used for video telephony (VT) and human-machine interface (HMI) application. According to one example, a region of interest (ROI) of stereoscopically captured images may be automatically determined based on determining disparity for at least one pixel of the captured images are described herein. According to another example, a zero disparity plane (ZDP) for the presentation of a 3D representation of stereoscopically captured images may be determined based on an identified ROI. According to this example, the ROI may be automatically identified, or identified based on receipt of user input identifying the ROI.

    摘要翻译: 本公开的技术涉及三维图像的基于反馈的立体显示,诸如可用于视频电话(VT)和人机界面(HMI)应用。 根据一个示例,可以基于确定捕获图像的至少一个像素的视差来自动确定立体拍摄图像的感兴趣区域(ROI)。 根据另一示例,可以基于所识别的ROI来确定用于呈现立体摄影图像的3D表示的零视差平面(ZDP)。 根据该示例,可以基于接收到识别ROI的用户输入来自动识别或识别ROI。

    Method and apparatus for accurate endpointing of speech in the presence of noise
    8.
    发明授权
    Method and apparatus for accurate endpointing of speech in the presence of noise 有权
    用于在存在噪声的情况下准确地终止语音的方法和装置

    公开(公告)号:US06324509B1

    公开(公告)日:2001-11-27

    申请号:US09246414

    申请日:1999-02-08

    IPC分类号: G10L1504

    CPC分类号: G10L25/87 G10L2025/786

    摘要: An apparatus for accurate endpointing of speech in the presence of noise includes a processor and a software module. The processor executes the instructions of the software module to compare an utterance with a first signal-to-noise-ratio (SNR) threshold value to determine a first starting point and a first ending point of the utterance. The processor then compares with a second SNR threshold value a part of the utterance that predates the first starting point to determine a second starting point of the utterance. The processor also then compares with the second SNR threshold value a part of the utterance that postdates the first ending point to determine a second ending point of the utterance. The first and second SNR threshold values are recalculated periodically to reflect changing SNR conditions. The first SNR threshold value advantageously exceeds the second SNR threshold value.

    摘要翻译: 用于在存在噪声的情况下准确地终止语音的装置包括处理器和软件模块。 处理器执行软件模块的指令,以将话语与第一信噪比(SNR)阈值进行比较,以确定话音的第一起始点和第一个终点。 然后,处理器与第二SNR阈值比较发声的一部分,该部分在第一起始点之前确定发音的第二起始点。 然后,处理器还与第二SNR阈值比较后续第一个终点的话语的一部分,以确定话语的第二个终点。 周期性地重新计算第一和第二SNR阈值以反映改变的SNR条件。 第一SNR阈值有利地超过第二SNR阈值。

    Content-adaptive systems, methods and apparatus for determining optical flow
    10.
    发明授权
    Content-adaptive systems, methods and apparatus for determining optical flow 有权
    用于确定光流的内容自适应系统,方法和装置

    公开(公告)号:US08553943B2

    公开(公告)日:2013-10-08

    申请号:US13160457

    申请日:2011-06-14

    IPC分类号: G06K9/00 H04N7/18

    摘要: Embodiments include methods and systems which determine pixel displacement between frames based on a respective weighting-value for each pixel or a group of pixels. The weighting-values provide an indication as to which pixels are more pertinent to optical flow computations. Computational resources and effort can be focused on pixels with higher weights, which are generally more pertinent to optical flow determinations.

    摘要翻译: 实施例包括基于每个像素或一组像素的相应加权值来确定帧之间的像素位移的方法和系统。 加权值提供关于哪些像素与光流计算更相关的指示。 计算资源和努力可以集中在具有较高权重的像素上,这通常与光流测定更相关。