Patent search ap:("AMAZON TECHNOLOGIES Page INC.") AND inv:"Bjorn Hoffmeister"

31.

发明授权
Keyword spotting using multi-task configuration 有权

公开(公告)号：US10304440B1

公开(公告)日：2019-05-28

申请号：US15198578

申请日：2016-06-30

Applicant: Amazon Technologies, Inc.

Inventor： Sankaran Panchapagesan , Bjorn Hoffmeister , Arindam Mandal , Aparna Khare , Shiv Naga Prasad Vitaladevuni , Spyridon Matsoukas , Ming Sun

IPC: G10L15/06 , G10L15/08 , G10L15/14 , G10L15/16 , G10L15/28

Abstract: An approach to keyword spotting makes use of acoustic parameters that are trained on a keyword spotting task as well as on a second speech recognition task, for example, a large vocabulary continuous speech recognition task. The parameters may be optimized according to a weighted measure that weighs the keyword spotting task more highly than the other task, and that weighs utterances of a keyword more highly than utterances of other speech. In some applications, a keyword spotter configured with the acoustic parameters is used for trigger or wake word detection.

32.

发明授权
Speech model retrieval in distributed speech recognition systems 有权

公开(公告)号：US10152973B2

公开(公告)日：2018-12-11

申请号：US14942551

申请日：2015-11-16

Applicant: Amazon Technologies, Inc.

Inventor： Bjorn Hoffmeister , Hugh Evan Secker-Walker , Jeffrey Cornelius O'Neill

IPC: G10L15/00 , G10L15/22 , G10L15/30 , G10L15/32

Abstract: Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.

33.

发明授权
Language model speech endpointing 有权

公开(公告)号：US10121471B2

公开(公告)日：2018-11-06

申请号：US14753811

申请日：2015-06-29

Applicant: Amazon Technologies, Inc.

Inventor： Bjorn Hoffmeister , Ariya Rastrow , Baiyang Liu

IPC: G10L15/22 , G10L15/18 , G10L15/26 , G10L15/183 , G10L25/93 , G10L25/87 , G10L25/78

Abstract: An automatic speech recognition (ASR) system detects an endpoint of an utterance using the active hypotheses under consideration by a decoder. The ASR system calculates the amount of non-speech detected by a plurality of hypotheses and weights the non-speech duration by the probability of each hypotheses. When the aggregate weighted non-speech exceeds a threshold, an endpoint may be declared.

34.

发明授权
Class-based discriminative training of speech models 有权

公开(公告)号：US09892726B1

公开(公告)日：2018-02-13

申请号：US14574239

申请日：2014-12-17

Applicant: Amazon Technologies, Inc.

Inventor： Sri Venkata Surya Siva Rama Krishna Garimella , Spyridon Matsoukas , Ariya Rastrow , Bjorn Hoffmeister

IPC: G10L15/00 , G10L15/06 , G10L15/14 , G10L15/08 , G10L15/22

CPC classification number: G10L15/063 , G10L15/08 , G10L15/14 , G10L15/22 , G10L25/27 , G10L2015/0631 , G10L2015/088 , G10L2015/223

Abstract: Features are disclosed for modifying a statistical model to more accurately discriminate between classes of input data. A subspace of the total model parameter space can be learned such that individual points in the subspace, corresponding to the various classes, are discriminative with respect to the classes. The subspace can be learned using an iterative process whereby an initial subspace is used to generate data and maximize an objective function. The objective function can correspond to maximizing the posterior probability of the correct class for a given input. The initial subspace, data, and objective function can be used to generate a new subspace that better discriminates between classes. The process may be repeated as desired. A model modified using such a subspace can be used to classify input data.

35.

发明申请
ANCHORED SPEECH DETECTION AND SPEECH RECOGNITION 审中-公开

公开(公告)号：US20170270919A1

公开(公告)日：2017-09-21

申请号：US15196228

申请日：2016-06-29

Applicant: Amazon Technologies, Inc.

Inventor： Sree Hari Krishnan Parthasarathi , Bjorn Hoffmeister , Brian King , Roland Maas

IPC: G10L15/20 , G10L15/16 , G10L25/87 , G10L15/08 , G10L15/02 , G10L17/06

CPC classification number: G10L15/20 , G10L15/02 , G10L15/08 , G10L15/16 , G10L17/02 , G10L17/06 , G10L17/18 , G10L25/87 , G10L2015/088 , G10L2025/783

Abstract: A system configured to process speech commands may classify incoming audio as desired speech, undesired speech, or non-speech. Desired speech is speech that is from a same speaker as reference speech. The reference speech may be obtained from a configuration session or from a first portion of input speech that includes a wakeword. The reference speech may be encoded using a recurrent neural network (RNN) encoder to create a reference feature vector. The reference feature vector and incoming audio data may be processed by a trained neural network classifier to label the incoming audio data (for example, frame-by-frame) as to whether each frame is spoken by the same speaker as the reference speech. The labels may be passed to an automatic speech recognition (ASR) component which may allow the ASR component to focus its processing on the desired speech.

36.

发明授权
Estimating false rejection rate in a detection system 有权
Title translation: 估计检测系统中的错误拒绝率

公开(公告)号：US09589560B1

公开(公告)日：2017-03-07

申请号：US14135309

申请日：2013-12-19

Applicant: Amazon Technologies, Inc.

Inventor： Shiv Naga Prasad Vitaladevuni , Bjorn Hoffmeister , Rohit Prasad

IPC: G10L15/01

CPC classification number: G10L15/01 , G06K9/6277

Abstract: Features are disclosed for estimating a false rejection rate in a detection system. The false rejection rate can be estimated by fitting a model to a distribution of detection confidence scores. An estimated false rejection rate can then be computed for confidence scores that fall below a threshold. The false rejection rate and model can be verified once the detection system has been deployed by obtaining additional data with confidence scores falling below the threshold. Adjustments to the model or other operational parameters can be implemented based on the verified false rejection rate, model, or additional data.

Abstract translation: 公开了用于估计检测系统中的假拒绝率的特征。可以通过将模型拟合到检测置信度分数的分布来估计错误拒绝率。然后可以计算低于阈值的置信度分数的估计的错误拒绝率。一旦检测系统被部署，可以通过获得低于阈值的置信度分数的附加数据来验证错误拒绝率和模型。可以基于验证的假拒绝率，模型或附加数据来实现对模型或其他操作参数的调整。

37.

发明授权
Anchored speech detection and speech recognition 有权

公开(公告)号：US11514901B2

公开(公告)日：2022-11-29

申请号：US16437763

申请日：2019-06-11

Applicant: Amazon Technologies, Inc.

Inventor： Sree Hari Krishnan Parthasarathi , Bjorn Hoffmeister , Brian King , Roland Maas

IPC: G10L15/20 , G10L15/02 , G10L17/06 , G10L25/87 , G10L15/08 , G10L15/16 , G10L17/18 , G10L25/78 , G10L17/02

Abstract: A system configured to process speech commands may classify incoming audio as desired speech, undesired speech, or non-speech. Desired speech is speech that is from a same speaker as reference speech. The reference speech may be obtained from a configuration session or from a first portion of input speech that includes a wakeword. The reference speech may be encoded using a recurrent neural network (RNN) encoder to create a reference feature vector. The reference feature vector and incoming audio data may be processed by a trained neural network classifier to label the incoming audio data (for example, frame-by-frame) as to whether each frame is spoken by the same speaker as the reference speech. The labels may be passed to an automatic speech recognition (ASR) component which may allow the ASR component to focus its processing on the desired speech.

38.

发明授权
Detecting system-directed speech 有权

公开(公告)号：US11361763B1

公开(公告)日：2022-06-14

申请号：US15694348

申请日：2017-09-01

Applicant: Amazon Technologies, Inc.

Inventor： Roland Maximilian Rolf Maas , Sri Harish Reddy Mallidi , Spyridon Matsoukas , Bjorn Hoffmeister

IPC: G10L15/22 , G10L15/02 , G10L15/16 , G10L15/18 , G10L15/34 , G10L17/22

Abstract: A speech-processing system capable of receiving and processing audio data to determine if the audio data includes speech that was intended for the system. Non-system directed speech may be filtered out while system-directed speech may be selected for further processing. A system-directed speech detector may use a trained machine learning model (such as a deep neural network or the like) to process a feature vector representing a variety of characteristics of the incoming audio data, including the results of automatic speech recognition and/or other data. Using the feature vector the model may output an indicator as to whether the speech is system-directed. The system may also incorporate other filters such as voice activity detection prior to speech recognition, or the like.

39.

发明申请
DEVICE-DIRECTED UTTERANCE DETECTION 有权

公开(公告)号：US20210295833A1

公开(公告)日：2021-09-23

申请号：US16822744

申请日：2020-03-18

Applicant: Amazon Technologies, Inc.

Inventor： Ariya Rastrow , Eli Joshua Fidler , Roland Maximilian Rolf Maas , Nikko Strom , Aaron Eakin , Diamond Bishop , Bjorn Hoffmeister , Sanjeev Mishra

IPC: G10L15/22 , G10L15/18 , G10L15/26

Abstract: A speech interface device is configured to detect an interrupt event and process a voice command without detecting a wakeword. The device includes on-device interrupt architecture configured to detect when device-directed speech is present and send audio data to a remote system for speech processing. This architecture includes an interrupt detector that detects an interrupt event (e.g., device-directed speech) with low latency, enabling the device to quickly lower a volume of output audio and/or perform other actions in response to a potential voice command. In addition, the architecture includes a device directed classifier that processes an entire utterance and corresponding semantic information and detects device-directed speech with high accuracy. Using the device directed classifier, the device may reject the interrupt event and increase a volume of the output audio or may accept the interrupt event, causing the output audio to end and performing speech processing on the audio data.

40.

发明授权
Speech detection and speech recognition 有权

公开(公告)号：US10923111B1

公开(公告)日：2021-02-16

申请号：US16368120

申请日：2019-03-28

Applicant: Amazon Technologies, Inc.

Inventor： Xing Fan , I-Fan Chen , Yuzong Liu , Bjorn Hoffmeister , Yiming Wang , Tongfei Chen

IPC: G10L15/02 , G10L15/16 , G10L15/26 , G10L15/10 , G10L17/00 , G10L15/08

Abstract: A system configured to recognize text represented by speech may determine that a first portion of audio data corresponds to speech from a first speaker and that a second portion of audio data corresponds to speech from the first speaker and a second speaker. Features of the first portion are compared to features of the second portion to determine a similarity therebetween. Based on this similarity, speech from the first speaker is distinguished from speech from the second speaker and text corresponding to speech from the first speaker is determined.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification