LATENCY REDUCTION FOR MULTI-STAGE SPEECH RECOGNITION

    公开(公告)号:US20240274127A1

    公开(公告)日:2024-08-15

    申请号:US18167763

    申请日:2023-02-10

    Abstract: Systems and techniques are provided for processing one or more audio samples. For example, a process can include receiving one or more audio samples in a first audio frame and determining, using a first keyword detection model, a first keyword detection score for the first audio frame. One or more audio samples can be received in additional audio frames. Based on the first keyword detection score exceeding a first threshold, the first keyword detection model can be used to determine a keyword detection score for each audio frame of the additional audio frames. The respective keyword detection score for each audio frame of the additional audio frames can be compared to a second threshold that is greater than the first threshold. Based on the respective keyword detection score exceeding the second threshold, using a second keyword detection model to process the first audio frame and the additional audio frames can be skipped.

    ADAPTIVE FRAME SKIPPING FOR SPEECH RECOGNITION

    公开(公告)号:US20240071370A1

    公开(公告)日:2024-02-29

    申请号:US17822715

    申请日:2022-08-26

    CPC classification number: G10L15/16 G10L15/08 G10L2015/088

    Abstract: Systems and techniques are described herein for processing audio signals. For instance, a process can include receiving a first audio frame associated with a first time frame. The process can further include generating a first time frame feature vector based on the first audio frame. The process can include determining a distance between the first time frame feature vector and a second time frame feature vector. The second time frame feature vector may be generated based on a second audio frame associated with a second time frame, where second time frame is being before the first time frame. The process can further include comparing the distance between the first time frame feature vector and the second time frame feature vector to a threshold distance. The process can include determining whether to skip processing of the first audio frame by an application based on the comparison.

Patent Agency Ranking