-
公开(公告)号:US11769491B1
公开(公告)日:2023-09-26
申请号:US17036091
申请日:2020-09-29
Applicant: Amazon Technologies, Inc.
Inventor: Abhishek Bafna , Haithem Albadawi
CPC classification number: G10L15/16 , G06N3/048 , G06N3/08 , G10L15/02 , G10L2015/088
Abstract: A system configured to perform utterance detection using data processing techniques that are similar to those used for object detection is provided. For example, the system may treat utterances within audio data as analogous to an object represented within an image and employ techniques to separate and identify individual utterances. The system may include one or more trained models that are trained to perform utterance detection. For example, the system may include a first module to process input audio data and identify whether speech is represented in the input audio data, a second module to apply convolution filters, and a third module configured to determine a boundary identifying a beginning and ending of a portion of the input audio data along with an utterance score indicating how closely the portion of the input audio data represents an utterance.