Performing utterance detection using convolution

    公开(公告)号:US11769491B1

    公开(公告)日:2023-09-26

    申请号:US17036091

    申请日:2020-09-29

    CPC classification number: G10L15/16 G06N3/048 G06N3/08 G10L15/02 G10L2015/088

    Abstract: A system configured to perform utterance detection using data processing techniques that are similar to those used for object detection is provided. For example, the system may treat utterances within audio data as analogous to an object represented within an image and employ techniques to separate and identify individual utterances. The system may include one or more trained models that are trained to perform utterance detection. For example, the system may include a first module to process input audio data and identify whether speech is represented in the input audio data, a second module to apply convolution filters, and a third module configured to determine a boundary identifying a beginning and ending of a portion of the input audio data along with an utterance score indicating how closely the portion of the input audio data represents an utterance.

Patent Agency Ranking