Patent search ap:("Sankar Basu" OR "Milind R. Naphade" OR "John R. Smith") AND inv:"Sankar Basu" Page 1

1.

发明授权
Adaptive probabilistic query expansion 有权
Title translation: 自适应概率查询扩展

公开(公告)号：US07437349B2

公开(公告)日：2008-10-14

申请号：US10143146

申请日：2002-05-10

Applicant: Sankar Basu , Milind R. Naphade , John R. Smith

Inventor： Sankar Basu , Milind R. Naphade , John R. Smith

IPC: G06F17/30

CPC classification number: G06F17/30672 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935

Abstract: A method, system and computer program for adaptively processing a query search. An expanding operation is utilized to expand the query into sub-queries, wherein at least one of the sub-queries is expanded probabilistically. A retrieving operation retrieves the results of the sub-queries, and a merging operation is used to merge the sub-query results into a search result. An adapting operation is configured to modify the search such that the relevance of the search result is increased when the search is repeated.

Abstract translation: 一种用于自适应处理查询搜索的方法，系统和计算机程序。使用扩展操作来将查询扩展到子查询中，其中至少一个子查询被概率地扩展。检索操作检索子查询的结果，并且使用合并操作将子查询结果合并到搜索结果中。配置操作被配置为修改搜索，使得当重复搜索时增加搜索结果的相关性。

2.

发明授权
Late integration in audio-visual continuous speech recognition 有权
Title translation: 视听连续语音识别的后期整合

公开(公告)号：US06633844B1

公开(公告)日：2003-10-14

申请号：US09452919

申请日：1999-12-02

Applicant: Ashish Verma , Sankar Basu , Chalapathy Neti

Inventor： Ashish Verma , Sankar Basu , Chalapathy Neti

IPC: G10L1504

CPC classification number: G10L15/25

Abstract: The combination of audio and video speech recognition in a manner to improve the robustness of speech recognition systems in noisy environments. Contemplated are methods and apparatus in which a video signal associated with a video source and an audio signal associated with the video signal are processed, the most likely viseme associated with the audio signal and video signal is determined and, thereafter, the most likely phoneme associated with the audio signal and video signal is determined.

Abstract translation: 音频和视频语音识别的组合以提高语音识别系统在嘈杂环境中的鲁棒性的方式。考虑到其中处理与视频源相关联的视频信号和与视频信号相关联的音频信号的方法和装置，确定与音频信号和视频信号相关联的最可能的视觉，并且之后，最可能的音素相关联音频信号和视频信号被确定。

3.

发明授权
Nongaussian density estimation for the classification of acoustic feature vectors in speech recognition 失效
Title translation: 用于语音识别中声学特征向量分类的非整数密度估计

公开(公告)号：US06269334B1

公开(公告)日：2001-07-31

申请号：US09104553

申请日：1998-06-25

Applicant: Sankar Basu , Charles A. Micchelli

Inventor： Sankar Basu , Charles A. Micchelli

IPC: G10L1300

CPC classification number: G10L15/02

Abstract: A statistical modeling paradigm for automatic machine recognition of speech uses mixtures of nongaussion statistical probability densities which provides improved recognition accuracy. Speech is modeled by building probability densities from functions of the form exp(−t&agr;/2) for t≧0 and &agr;>0. Mixture components are constructed from different univariate functions. The mixture model is used in a maximum likelihood model of speech data.

Abstract translation: 用于语音自动机器识别的统计建模范例使用非功能统计概率密度的混合，提高了识别精度。通过从t> = 0和alpha> 0的形式exp（-talpha / 2）的函数构建概率密度来建模语音。混合物组分由不同的单变量功能构成。混合模型用于语音数据的最大似然模型。

4.

发明授权
Method and apparatus for audio-visual speech detection and recognition 有权
Title translation: 用于视听语音检测和识别的方法和装置

公开(公告)号：US06816836B2

公开(公告)日：2004-11-09

申请号：US10231676

申请日：2002-08-30

Applicant: Sankar Basu , Philippe Christian de Cuetos , Stephane Herman Maes , Chalapathy Venkata Neti , Andrew William Senior

Inventor： Sankar Basu , Philippe Christian de Cuetos , Stephane Herman Maes , Chalapathy Venkata Neti , Andrew William Senior

IPC: G10L1500

CPC classification number: G06K9/00228 , G06K9/00335 , G10L15/25 , G10L25/78

Abstract: Techniques for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and recognizing at least a portion of the processed audio signal, using at least a portion of the processed video signal, to generate an output signal representative of the audio signal.

5.

发明授权
Wavelet-based energy binning cepstal features for automatic speech recognition 有权
Title translation: 用于自动语音识别的基于小波的能量收集cepstal特征

公开(公告)号：US06253175B1

公开(公告)日：2001-06-26

申请号：US09201055

申请日：1998-11-30

Applicant: Sankar Basu , Stephane H. Maes

Inventor： Sankar Basu , Stephane H. Maes

IPC: G10L1500

CPC classification number: G10L15/02 , G10L25/27 , G10L2015/0631

Abstract: Systems and methods for processing acoustic speech signals which utilize the wavelet transform (and alternatively, the Fourier transform) as a fundamental tool. The method essentially involves “synchrosqueezing” spectral component data obtained by performing a wavelet transform (or Fourier transform) on digitized speech signals. In one aspect, spectral components of the synchrosqueezed plane are dynamically tracked via a K-means clustering algorithm. The amplitude, frequency and bandwidth of each of the components are, thus, extracted. The cepstrum generated from this information is referred to as “K-mean Wastrum.” In another aspect, the result of the K-mean clustering process is further processed to limit the set of primary components to formants. The resulting features are referred to as “formant-based wastrum.” Formants are interpolated in unvoiced regions and the contribution of unvoiced turbulent part of the spectrum are added. This method requires adequate formant tracking. The resulting robust formant extraction has a number of applications in speech processing and analysis including vocal tract normalization.

Abstract translation: 用于处理利用小波变换（或者替代地，傅立叶变换）作为基本工具的声学语音信号的系统和方法。该方法基本上涉及通过对数字化语音信号执行小波变换（或傅立叶变换）而获得的“同步挤压”频谱分量数据。在一个方面，同步挤压平面的频谱分量通过K均值聚类算法动态跟踪。因此，提取每个组件的振幅，频率和带宽。从该信息产生的倒谱被称为“K均值膜”。在另一方面，进一步处理K均值聚类过程的结果以将主要组分的集合限制为共轭体。所得到的特征被称为“基于共振峰的水泥”。在无声区域中插入共聚体，并加入谱的无声湍流部分的贡献。这种方法需要足够的共振峰跟踪。所得到的鲁棒共振峰提取在语音处理和分析中具有许多应用，包括声道归一化。

6.

发明授权
Methods and apparatus for audio-visual speaker recognition and utterance verification 有权
Title translation: 视听说话者识别和话语验证的方法和装置

公开(公告)号：US06219640B1

公开(公告)日：2001-04-17

申请号：US09369706

申请日：1999-08-06

Applicant: Sankar Basu , Homayoon S. M. Beigi , Stephane Herman Maes , Benoit Emmanuel Ghislain Maison , Chalapathy Venkata Neti , Andrew William Senior

Inventor： Sankar Basu , Homayoon S. M. Beigi , Stephane Herman Maes , Benoit Emmanuel Ghislain Maison , Chalapathy Venkata Neti , Andrew William Senior

IPC: G10L1500

CPC classification number: G06K9/6293 , G06K9/00221 , G06K9/00885 , G07C9/00158 , G10L2015/226

Abstract: Methods and apparatus for performing speaker recognition comprise processing a video signal associated with an arbitrary content video source and processing an audio signal associated with the video signal. Then, an identification and/or verification decision is made based on the processed audio signal and the processed video signal. Various decision making embodiments may be employed including, but not limited to, a score combination approach, a feature combination approach, and a re-scoring approach. In another aspect of the invention, a method of verifying a speech utterance comprises processing a video signal associated with a video source and processing an audio signal associated with the video signal. Then, the processed audio signal is compared with the processed video signal to determine a level of correlation between the signals. This is referred to as unsupervised utterance verification. In a supervised utterance verification embodiment, the processed video signal is compared with a script representing an audio signal associated with the video signal to determine a level of correlation between the signals.

Abstract translation: 用于执行说话者识别的方法和装置包括处理与任意内容视频源相关联的视频信号并处理与视频信号相关联的音频信号。然后，基于经处理的音频信号和处理的视频信号进行识别和/或验证决定。可以采用各种决策实施例，包括但不限于分数组合方法，特征组合方法和重新评分方法。在本发明的另一方面，验证语音发声的方法包括处理与视频源相关联的视频信号并处理与视频信号相关联的音频信号。然后，将经处理的音频信号与经处理的视频信号进行比较，以确定信号之间的相关性水平。这被称为无监督话语验证。在受监督的话语验证实施例中，将处理的视频信号与表示与视频信号相关联的音频信号的脚本进行比较，以确定信号之间的相关性水平。

7.

发明授权
Impulsivity estimates of mixtures of the power exponential distrubutions in speech modeling 失效
Title translation: 功率指数分布在语音建模中的混合的冲动性估计

公开(公告)号：US06804648B1

公开(公告)日：2004-10-12

申请号：US09275782

申请日：1999-03-25

Applicant: Sankar Basu , Charles A. Micchelli , Peder A. Olsen

Inventor： Sankar Basu , Charles A. Micchelli , Peder A. Olsen

IPC: G10L1528

CPC classification number: G10L15/144

Abstract: A parametric family of multivariate density functions formed by mixture models from univariate functions of the type exp(−|x|&bgr;) for modeling acoustic feature vectores are used in automatic recognition of speech. The parameter &bgr; is used to measure the non-Gaussian nature of the data. &bgr; is estimated from the input data using a maximum likelihood criterion. There is a balance between &bgr; and the number of data points that must be satisfied for efficient estimation.

Abstract translation: 用于建模声学特征矢量的类型为exp（ - | x |β）的单变量函数的混合模型形成的多变量密度函数的参数族被用于语音的自动识别。参数β用于测量数据的非高斯性质。使用最大似然准则从输入数据估计β。在有效估计之间必须满足beta和数据点数之间的平衡。

8.

发明授权
Maximum entropy and maximum likelihood criteria for feature selection from multivariate data 有权
Title translation: 用于多变量数据的特征选择的最大熵和最大似然准则

公开(公告)号：US06609094B1

公开(公告)日：2003-08-19

申请号：US09576429

申请日：2000-05-22

Applicant: Sankar Basu , Charles A. Micchelli , Peder Olsen

Inventor： Sankar Basu , Charles A. Micchelli , Peder Olsen

IPC: G10L1508

CPC classification number: G06K9/623 , G10L15/02

Abstract: Improvements in speech recognition systems are achieved by considering projections of the high dimensional data on lower dimensional subspaces, subsequently by estimating the univariate probability densities via known univariate techniques, and then by reconstructing the density in the original higher dimensional space from the collection of univariate densities so obtained. The reconstructed density is by no means unique unless further restrictions on the estimated density are imposed. The variety of choices of candidate univariate densities as well as the choices of subspaces on which to project the data including their number further add to this non-uniqueness. Probability density functions are then considered that maximize certain optimality criterion as a solution to this problem. Specifically, those probability density function's that either maximize the entropy functional, or alternatively, the likelihood associated with the data are considered.

Abstract translation: 通过考虑对低维子空间上的高维数据的预测，随后通过已知的单变量技术估计单变量概率密度，然后通过从单变量密度的收集重建原始高维空间中的密度来实现语音识别系统的改进如此获得。重建密度绝不是独一无二的，除非对估计密度的进一步限制。候选单变量密度的选择种类以及项目数据（包括其数量）的子空间选择进一步增加了这种非唯一性。因此，概率密度函数被认为是最大化某些最优性准则作为解决这一问题的方法。具体地说，考虑使熵函数最大化的概率密度函数，或替代地，与数据相关联的可能性。

9.

发明授权
Methods and apparatus for audio-visual speech detection and recognition 有权
Title translation: 视听语音检测和识别的方法和装置

公开(公告)号：US06594629B1

公开(公告)日：2003-07-15

申请号：US09369707

申请日：1999-08-06

Applicant: Sankar Basu , Philippe Christian de Cuetos , Stephane Herman Maes , Chalapathy Venkata Neti , Andrew William Senior

Inventor： Sankar Basu , Philippe Christian de Cuetos , Stephane Herman Maes , Chalapathy Venkata Neti , Andrew William Senior

IPC: G10L1500

CPC classification number: G06K9/00228 , G06K9/00335 , G10L15/25 , G10L25/78

Abstract: In a first aspect of the invention, methods and apparatus for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and decoding the processed audio signal in conjunction with the processed video signal to generate a decoded output signal representative of the audio signal. In a second aspect 6f the invention, methods and apparatus for providing speech detection in accordance with a speech recognition system comprise the steps of processing a video signal associated with a video source to detect whether one or more features associated with the video signal are representative of speech, and processing an audio signal associated with the video signal in accordance with the speech recognition system to generate a decoded output signal representative of the audio signal when the one or more features associated with the video signal are representative of speech. Speech detection may also be performed using information from both the video path and the audio path simultaneously.

Abstract translation: 在本发明的第一方面，用于提供语音识别的方法和装置包括以下步骤：处理与任意内容视频源相关联的视频信号，处理与视频信号相关联的音频信号，以及结合处理的音频信号处理的视频信号以产生表示音频信号的解码输出信号。在本发明的第二方面6f中，根据语音识别系统提供语音检测的方法和装置包括以下步骤：处理与视频源相关联的视频信号，以检测与视频信号相关联的一个或多个特征是否代表并且当与视频信号相关联的一个或多个特征代表语音时，根据语音识别系统处理与视频信号相关联的音频信号，以产生表示音频信号的解码输出信号。也可以使用来自视频路径和音频路径的信息同时执行语音检测。

10.

发明授权
Speech driven lip synthesis using viseme based hidden markov models 有权
Title translation: 使用基于Viseme的隐马尔可夫模型的语音驱动唇形合成

公开(公告)号：US06366885B1

公开(公告)日：2002-04-02

申请号：US09384763

申请日：1999-08-27

Applicant: Sankar Basu , Tanveer Atzal Faruquie , Chalapathy V. Neti , Nitendra Rajput , Andrew William Senior , L. Venkata Subramaniam , Ashish Verma

Inventor： Sankar Basu , Tanveer Atzal Faruquie , Chalapathy V. Neti , Nitendra Rajput , Andrew William Senior , L. Venkata Subramaniam , Ashish Verma

IPC: G10L2106

CPC classification number: G11B27/10 , G10L2021/105 , G11B27/031

Abstract: A method of speech driven lip synthesis which applies viseme based training models to units of visual speech. The audio data is grouped into a smaller number of visually distinct visemes rather than the larger number of phonemes. These visemes then form the basis for a Hidden Markov Model (HMM) state sequence or the output nodes of a neural network. During the training phase, audio and visual features are extracted from input speech, which is then aligned according to the apparent viseme sequence with the corresponding audio features being used to calculate the HMM state output probabilities or the output of the neutral network. During the synthesis phase, the acoustic input is aligned with the most likely viseme HMM sequence (in the case of an HMM based model) or with the nodes of the network (in the case of a neural network based system), which is then used for animation.

Abstract translation: 基于视觉训练模型的视觉语音单元的语音驱动唇形合成方法。音频数据被分组为较少数量的视觉上不同的视角，而不是较大数量的音素。这些视差然后形成了隐马尔可夫模型（HMM）状态序列或神经网络的输出节点的基础。在训练阶段，从输入语音中提取音频和视觉特征，然后根据明显的视度序列对准音频特征，使用相应的音频特征来计算HMM状态输出概率或中性网络的输出。在合成阶段期间，声输入与最可能的viseme HMM序列（在基于HMM的模型的情况下）或网络的节点（在基于神经网络的系统的情况下）对齐，然后使用用于动画。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification