专利检索 ap:"Trausti Kristjansson" 第 1 页

1.

发明授权
Multisensory speech detection 有权
标题翻译：多感觉语音检测

公开(公告)号：US09009053B2

公开(公告)日：2015-04-14

申请号：US12615583

申请日：2009-11-10

申请人： Dave Burke , Michael J. Lebeau , Konrad Gianno , Trausti Kristjansson , John Nicholas Jitkoff , Andrew W. Senior

发明人： Dave Burke , Michael J. Lebeau , Konrad Gianno , Trausti Kristjansson , John Nicholas Jitkoff , Andrew W. Senior

IPC分类号： G01L21/00 , G06F3/0346 , G10L25/78

CPC分类号： G10L25/78 , G06F3/0346 , G06F3/167 , G10L15/10 , G10L15/22 , G10L15/265 , G10L17/00 , G10L25/21 , H04M1/72569 , H04M2250/12 , H04M2250/74 , H04R1/08 , H04W4/026

摘要： A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.

摘要翻译： 公开了一种计算机实现的多感觉语音检测方法。该方法包括基于移动设备的方向来确定移动设备的方位并确定移动设备的操作模式。该方法还包括识别基于所确定的操作模式来指定语音检测何时开始或结束的语音检测参数，以及基于语音检测参数来检测来自移动设备的用户的语音。

2.

发明申请
Word-Level Correction of Speech Input 有权
标题翻译：语音输入字词校正

公开(公告)号：US20120022868A1

公开(公告)日：2012-01-26

申请号：US13249539

申请日：2011-09-30

申请人： Michael J. LeBeau , William J. Byrne , John Nicholas Jitkoff , Brandon M. Ballinger , Trausti Kristjansson

发明人： Michael J. LeBeau , William J. Byrne , John Nicholas Jitkoff , Brandon M. Ballinger , Trausti Kristjansson

IPC分类号： G10L15/26

CPC分类号： G10L15/22 , G06F3/0482 , G06F3/04842 , G06F3/04886 , G06F17/2241 , G06F17/24 , G06F17/273 , G06F17/277 , G10L15/01 , G10L15/26 , G10L15/265 , G10L15/30

摘要： The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.

摘要翻译： 除了别的以外，本说明书的主题可以实现用于校正转录文本中的单词的计算机实现的方法，包括从麦克风接收语音音频数据。该方法还包括将语音音频数据发送到转录系统。该方法还包括从转录系统接收从语音音频数据转录的单词格。该方法还包括从单词格中呈现一个或多个转录词。所述方法还包括接收所呈现的转录词中的至少一个的用户选择。该方法还包括向所选择的转录词提供来自词格的一个或多个替代词。该方法还包括接收至少一个替代单词的用户选择。所述方法还包括用所选择的替代词替换所呈现的转录词中的所选转录词。

3.

发明申请
Speech and Noise Models for Speech Recognition 有权

公开(公告)号：US20120022860A1

公开(公告)日：2012-01-26

申请号：US13250777

申请日：2011-09-30

申请人： Matthew I. Lloyd , Trausti Kristjansson

发明人： Matthew I. Lloyd , Trausti Kristjansson

IPC分类号： G10L21/02

CPC分类号： G10L15/20 , G10L21/0208

摘要： An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.

4.

发明申请
SPEECH DETECTION AND ENHANCEMENT USING AUDIO/VIDEO FUSION 有权
标题翻译：使用音频/视频融合的语音检测和增强

公开(公告)号：US20080059174A1

公开(公告)日：2008-03-06

申请号：US11852961

申请日：2007-09-10

申请人： John Hershey , Trausti Kristjansson , Hagai Attias , Nebojsa Jojic

发明人： John Hershey , Trausti Kristjansson , Hagai Attias , Nebojsa Jojic

IPC分类号： G10L15/00

CPC分类号： G10L15/065 , G10L15/20 , G10L15/25 , G10L25/78

摘要： A system and method facilitating speech detection and/or enhancement utilizing audio/video fusion is provided. The present invention fuses audio and video in a probabilistic generative model that implements cross-model, self-supervised learning, enabling rapid adaptation to audio visual data. The system can learn to detect and enhance speech in noise given only a short (e.g., 30 second) sequence of audio-visual data. In addition, it automatically learns to track the lips as they move around in the video.

摘要翻译： 提供了利用音频/视频融合的促进语音检测和/或增强的系统和方法。本发明将音频和视频融合在实现跨模型，自我监督学习的概率生成模型中，使得能够快速适应视听数据。该系统可以学习仅在短（例如，30秒）的视听数据序列中检测和增强噪声中的语音。此外，它会自动学习在视频中移动时跟踪嘴唇。

5.

发明授权
Speech recognition using repeated utterances 有权
标题翻译：使用重复发音的语音识别

公开(公告)号：US09123339B1

公开(公告)日：2015-09-01

申请号：US12953344

申请日：2010-11-23

申请人： Hayden Shaw , Trausti Kristjansson , Andrew W. Senior

发明人： Hayden Shaw , Trausti Kristjansson , Andrew W. Senior

IPC分类号： G10L15/22

CPC分类号： G10L15/22 , G10L15/10 , G10L15/18 , G10L2015/085

摘要： Subject matter described in this specification can be embodied in methods, computer program products and systems relating to speech-to-text conversion. A first spoken input is received from a user of an electronic device (an “original utterance”). Based on the original utterance, a first set of character string candidates are determined that each represent the original utterance converted to textual characters and a selection of one or more of the character string candidates are provided in a format for display to the user. A second spoken input is received from the user and a determination is made that the second spoken input is a repeat utterance of the original utterance. Based on this determination and using the original utterance and the repeat utterance, a second set of character string candidates is determined.

摘要翻译： 本说明书中描述的主题可以体现在与语音到文本转换相关的方法，计算机程序产品和系统中。从电子设备的用户接收到第一个口头输入（“原始话语”）。基于原始发音，确定第一组字符串候选，其中每一个表示转换为文本字符的原始发音，并且以用于向用户显示的格式提供一个或多个字符串候选的选择。从用户接收到第二个口头输入，并且确定第二个口头输入是原始话语的重复发音。基于该确定并使用原始发音和重复发音，确定第二组字符串候选。

6.

发明授权
Speech and noise models for speech recognition 有权

公开(公告)号：US08249868B2

公开(公告)日：2012-08-21

申请号：US13250777

申请日：2011-09-30

申请人： Matthew I. Lloyd , Trausti Kristjansson

发明人： Matthew I. Lloyd , Trausti Kristjansson

IPC分类号： G10L15/20

CPC分类号： G10L15/20 , G10L21/0208

摘要： An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.

7.

发明授权
Speech and noise models for speech recognition 有权
标题翻译：用于语音识别的语音和噪声模型

公开(公告)号：US08234111B2

公开(公告)日：2012-07-31

申请号：US12814665

申请日：2010-06-14

申请人： Matthew I. Lloyd , Trausti Kristjansson

发明人： Matthew I. Lloyd , Trausti Kristjansson

IPC分类号： G10L15/20

CPC分类号： G10L15/20 , G10L21/0208

摘要： An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.

摘要翻译： 可以接收由基于来自用户的音频输入的设备生成的音频信号。音频信号可以包括至少一个对应于由该设备记录的一个或多个用户话语的用户音频部分。可以访问与用户相关联的用户语音模型，并且可以确定音频信号中的背景音频低于定义的阈值。响应于确定音频信号中的背景音频低于定义的阈值，可以基于音频信号来调整所访问的用户语音模型，以生成对用户的语音特征进行建模的适配的用户语音模型。可以使用适配的用户语音模型对所接收的音频信号执行噪声补偿，以生成与接收的音频信号相比具有降低的背景音频的滤波音频信号。

8.

发明申请
GEOTAGGED ENVIRONMENTAL AUDIO FOR ENHANCED SPEECH RECOGNITION ACCURACY 有权
标题翻译： GEOTAGGED环境音频用于增强语音识别精度

公开(公告)号：US20120022870A1

公开(公告)日：2012-01-26

申请号：US13250843

申请日：2011-09-30

申请人： Trausti Kristjansson , Matthew I. Lloyd

发明人： Trausti Kristjansson , Matthew I. Lloyd

IPC分类号： H04W64/00 , G10L15/00

CPC分类号： G10L21/0208 , G10L15/20

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, generating a noise model for the particular geographic location using a subset of the geotagged audio signals, where noise compensation is performed on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.

摘要翻译： 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于增强语音识别精度。一方面，一种方法包括接收对应于多个地理位置中的多个移动设备记录的环境音频的地理标记音频信号，接收对应于由特定移动设备记录的话语的音频信号，确定与该特定移动设备相关联的特定地理位置特定的移动设备，使用所述地理标记的音频信号的子集来生成针对所述特定地理位置的噪声模型，其中使用对于所述特定地理位置生成的所述噪声模型对与所述话语相对应的所述音频信号执行噪声补偿。

9.

发明申请
ACOUSTIC MODEL ADAPTATION USING GEOGRAPHIC INFORMATION 有权
标题翻译：使用地理信息的声学模型适应

公开(公告)号：US20110295590A1

公开(公告)日：2011-12-01

申请号：US12787568

申请日：2010-05-26

申请人： Matthew I. Lloyd , Trausti Kristjansson

发明人： Matthew I. Lloyd , Trausti Kristjansson

IPC分类号： G06F17/20

CPC分类号： G10L15/22 , G10L15/065 , G10L15/30

摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving an audio signal that corresponds to an utterance recorded by a mobile device, determining a geographic location associated with the mobile device, adapting one or more acoustic models for the geographic location, and performing speech recognition on the audio signal using the one or more acoustic models model that are adapted for the geographic location.

摘要翻译： 方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于增强语音识别精度。在一个方面，一种方法包括接收对应于由移动设备记录的话语的音频信号，确定与移动设备相关联的地理位置，调整用于地理位置的一个或多个声学模型，以及对该音频执行语音识别使用适合于地理位置的一个或多个声学模型模型的信号。

10.

发明授权
Method and apparatus for scene learning and three-dimensional tracking using stereo video cameras 有权
标题翻译：使用立体摄像机进行场景学习和三维跟踪的方法和装置

公开(公告)号：US07486815B2

公开(公告)日：2009-02-03

申请号：US10783709

申请日：2004-02-20

申请人： Trausti Kristjansson , Hagai Attias , John R. Hershey

发明人： Trausti Kristjansson , Hagai Attias , John R. Hershey

IPC分类号： G06K9/00 , H04N13/02 , H04N5/225

CPC分类号： G06K9/32 , G06T7/285

摘要： A method and apparatus are provided for learning a model for the appearance of an object while tracking the position of the object in three dimensions. Under embodiments of the present invention, this is achieved by combining a particle filtering technique for tracking the object's position with an expectation-maximization technique for learning the appearance of the object. Two stereo cameras are used to generate data for the learning and tracking.

摘要翻译： 提供了一种方法和装置，用于在跟踪三维物体的位置的同时学习物体外观的模型。在本发明的实施例中，这是通过将用于跟踪对象的位置的粒子滤波技术与用于学习对象的外观的期望最大化技术组合来实现的。两个立体相机用于生成用于学习和跟踪的数据。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类